Week 3 – Random Variables

This week sees the first implementation of random variables. Random variables are SymPy symbols which carry around ProbabilitySpace information. In expressions they interact just like normal SymPy symbols until special functions (like PDF or expectation_value) are called, at which point they take on special meaning.

As of this morning there is a rudimentary implementation and some mechanics written down. This feels more like experimental code than a polished library however. Still, it’s very satisfying to see things work. Some examples with finite probability spaces

>>> d1,d2 = Die(6), Die(6)
>>> die1, die2 = d1.value, d2.value # Some Random Variables

# RV's act just like normal SymPy symbols in expressions
>>> 2*die1 + die2 - die1
die₁ + die₂

# But some functions, like Probability Density Function, can watch out for them
>>> PDF(die1)
{1: 1/6, 2: 1/6, 3: 1/6, 4: 1/6, 5: 1/6, 6: 1/6}

>>> PDF(die1 + die2)
{2: 1/36, 3: 1/18, 4: 1/12, 5: 1/9, 6: 5/36, 7: 1/6, 8: 5/36, 9: 1/9, 10: 1/12, 11: 1/18, 12: 1/36}

# Once we have the PDF other operations are easy to define
>>> E(die1+die2) # Expectation Value
7
>>> var(die1) # Variance
35/12
>>> covar(die1, die2) # Covariance
0

# Complicated expressions are also doable
>>> PDF( Eq(cos(pi*die1), 1) )
{False: 1/2, True: 1/2}

Starting from the work on probability spaces and events, this work on discrete random variables was actually fairly easy. Computing a PDF of an expression can be done by looking at “The measure of all atomic elements within the sample space of the cartesian product of the individual probability spaces.” There is some meat in that last sentence, luckily it’s all been done before. The challenge behind random variables was how to connect those mechanics to a simple language that interacts well in SymPy.

Throughout this project I’ve been treating the discrete and continuous worlds separately while making sure to provide easy ways to glue them together. This is still the intention with Random Variables but I have yet to find the glue.

Performing computations on Continuous Random Variables is tricker than in the Discrete case. In the discrete case you simply go through and tabulate every possibility much like this haskell library. The continuous case on the other hand requires some calculus and automatic solution of complicated expressions. This is one of the reasons why it’s so nice putting this project inside SymPy – much of the legwork is already done by other smarter folks. Still, it’s challenging to find the correct tools in SymPy and to decide how to troubleshoot when the expressions that I generate create errors downstream.

For the continuous case I have a first pass at a method which solves (in theory) general expressions on (multiple) random variables given some assumptions about the underlying PDF. It gives correct answers to the expressions I can cook up however sometimes those answers are not ideally simplified. By tweaking various simplification settings I can get the following results on normal random variables ( a difficult yet very common case )

>>> x,y = Symbol('x', real=True), Symbol('y', real=True)
>>> pdfx, pdfy = exp(-x**2/2)/sqrt(2*pi), exp(-y**2/2)/sqrt(2*pi)
>>> C,D = ContinuousProbabilitySpace(x, pdfx), ContinuousProbabilitySpace(y, pdfy)
>>> X,Y = C.value, D.value # Some Normal Random Variables

>>> PDF(X) # Grab out the PDF
2**(1/2)*exp(-_x**2/2)/(2*pi**(1/2))

>>> PDF(2*X) # Note that the standard deviation doubles and the expression remains normalized
2**(1/2)*exp(-_x**2/8)/(4*pi**(1/2))

>>> PDF(X+Y)
-exp(-_x**2/4)*erf(-_x/2 - oo)/(4*pi**(1/2)) + exp(-_x**2/4)*erf(-_x/2 + oo)/(4*pi**(1/2))
# If we note that erf(oo)==1 and erf(-oo)==-1 then we have
exp(-_x**2/4)/(2*pi**(1/2)) # Note that the variances added

>>> PDF(cos(pi*X)) # More complicated expressions are also doable
-2**(1/2)*exp(-acos(_x)**2/(2*pi**2))/(2*pi**(3/2)*(-_x**2 + 1)**(1/2))

So the mechanics work (at least for these cases) and that’s satisfying. However this is the first time in this project that I’ve written code without an eye to it being extensible. It’s fun to see these results but they’re not yet indicative of a real solution. My goal for next week is to not be tempted to develop any new functionality and instead to work on organization.

Advertisements

About mrocklin

PhD student studying Computational Mathematics at the University of Chicago.
This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Week 3 – Random Variables

  1. Szymon says:

    Hi, we are doing a similar project pacal.sf.net. Maybe there is some possibility for exchange of ideas?

  2. Pingback: Week 4: Breaking and Rebuilding Week 3 | sympy-stats

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s