Week 4: Breaking and Rebuilding Week 3

Last week we created random expressions within SymPy by including random symbols within sympy expressions. We discussed some simple examples like the addition of two dice or two normal random variables. I expressed at the end of the post that the code felt fragile, not sufficiently general to handle complex tasks.

This week I focused on breaking the code with complex tasks and building up the places where it was weak. It’s been a frustrating yet satisfying time with some fun examples to share below. If I were you I’d skip the following long-winded text and head straight to the code/math below.

I started the week thinking about

  • PDFs of Continuous v. Discrete variables and what to do in the mixed case
  • Building in special cases like normal random variables
  • Creating an extra layer of abstraction in my code. It was getting ugly

Unifying PDFs for continuous and discrete cases turns out to be a bad idea (I think). They have two distinct mathematical meanings. Now we have PDF and PMF (for probability mass function) functions. Calling PDF on a purely discrete expression will raise an error. In an expression with both a continuous and discrete random variable the continuous PDF makes more sense and takes over.

I wanted to build in special cases to sympy.stats because my integrals on Gaussians were taking forever and often failing (as you can see in the convolution example below). Integrating over Gaussians is a fairly common operation for me and is something that should be fast, has easy-to-write-down rules, and is hell on the SymPy integrator. I always shy away from coding in special cases though, it tends to muck things up. To my great relief I found that Tom Bachmann, another GSoC’er is doing some great work over here on a more powerful definite integration engine. I’m hopeful that his work will make my work better and I’m holding off on special cases for now.

Computing PDF’s of complex (algebraically) mixed (continuous and discrete) expressions is hard, both for humans and computers. I was computing probabilities and expectations using the PDF as my primary operation and this was often failing. I’ve now basing lots of computations on marginalization which is more robust but provides less information (which for things like expectations is perfect). It’s easy to define for all types of random variables independently.

I’ve also created a syntactically nice way to create events from random variables and relationals. Now you can type things like eventify(die1 > 3) to get the Event object die1 in {4, 5, 6} or just type P(die1>3) and get out 1/2. Ideally I’d like to turn something like this into a stronger syntax so that for example you could compute the joint pdf using “Givens” and conditionals P(X,Y) == P(X|Y==y)P(Y)

And finally! Some code examples. The equations look great in isympy but are too large to copy/paste. You’re going to get output from the latex printer instead. It’s interesting to note that I’ve only coded in the pdfs to the basic distributions, nothing else. There are no special rules here, no expectations are known explicitly and sympy doesn’t know things like “variances add”. SymPy.statistics is generating equations and integrals and SymPy is solving them.

>>> from sympy.statistics import *

>>> alpha = Symbol('alpha', real=True, positive=True, finite=True)
>>> beta = Symbol('beta', real=True, positive=True, finite=True)
>>> sigma = Symbol('sigma', real=True, finite=True, bounded=True, positive=True)
>>> rate = Symbol('lambda', positive=True, real=True, finite=True)

>>> X = NormalProbabilitySpace(0,sigma).value
>>> Y = ExponentialProbabilitySpace(rate).value

>>> PDF(Y) : \begin{pmatrix}y, & \lambda e^{-y \lambda}\end{pmatrix}
>>> E(Y) : \frac{1}{\lambda}
>>> var(Y) : \lambda^{-2}
>>> skewness(Y) : 2

>>> PDF(X) : \begin{pmatrix}y, & \frac{\sqrt{2} e^{\frac{-y^{2}}{2 \sigma^{2}}}}{2 \sqrt{\pi} \sigma }\end{pmatrix}
>>> PDF(alpha*X + beta) : \begin{pmatrix}y, & \frac{\sqrt{2}e^{\frac{-\left(y - \beta\right)^{2}}{2 \alpha^{2} \sigma^{2}}}}{2 \sqrt{\pi} \alpha \sigma }\end{pmatrix}
>>> E(X) : 0
>>> E(alpha*X + beta) : \beta
>>> std(X) : \sigma
>>> simplify(std(alpha*X + beta)) : \alpha \sigma
>>> skewness(alpha*X + beta) : 0

In the following example it correctly determines that a convolution is necessary but the integration engine can’t handle this (yet?).
>>> PDF(X+Y) : \begin{pmatrix}y, & \int_{0}^{\infty} \frac{\sqrt{2} \lambda e^{-x \lambda} e^{\frac{-\left(- x + y\right)^{2}}{2 \sigma^{2}}}}{2 \sqrt{\pi} \sigma }\,dx\end{pmatrix}

These still work, even though PDF fails
>>> E(X+Y) : \frac{1}{\lambda}
>>> simplify(var(X+Y)) : \frac{\lambda^{2} \sigma^{2} + 1}{\lambda^{2}}
>>> skewness(X+Y) : - 2 \frac{\sqrt{2} \sqrt{\pi} \sigma}{\lambda \left(\lambda^{2} \sigma^{2} + 1\right)^{\frac{3}{2}}}  : Can anyone verify that this is correct?

>>> D1,D2,D3 = Die().value, Die().value, Die().value

>>> eventify(D1>3)
    die₁ in {4, 5, 6}

>>> eventify(D1+D2>10)
    (die₂, die₁) in ({6} x {6}) ∪ ({6} x {5}) ∪ ({5} x {6})

>>> eventify(And(D1>3, D1+D2<6))
    (die₁, die₂) in {4} x {1}

>>> P(D1>D2)
    5/12

# This works with continuous variables too
>>> eventify(And(X>-1, X<1))
    x in (-1, 1)

# Mixing is just fine
>>> P(And(X>0, D1>3))
    1/4

>>> P(X**2<1) : \textrm{erf}\left(\frac{\sqrt{2}}{2 \sigma}\right)

>>> PDF(X+D1) : \begin{pmatrix}y, & \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -1\right)^{2}}{2 \sigma^{2}}}} + \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -2\right)^{2}}{2 \sigma^{2}}}} + \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -3\right)^{2}}{2 \sigma^{2}}}} + \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -4\right)^{2}}{2 \sigma^{2}}}} + \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -5\right)^{2}}{2 \sigma^{2}}}} + \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -6\right)^{2}}{2 \sigma^{2}}}}\end{pmatrix}

This example is silly. It’s the convolution of a die roll with a Gaussian. Neat to see complex problems work out though!

Plan for next week:
This last week I didn’t intend to write any new functionality and ended up with much more functional code. The new random variable stuff is completely undocumented and only sparsely tested. There is still more work to be done without a specific goal in mind.

Advertisements

About mrocklin

PhD student studying Computational Mathematics at the University of Chicago.
This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to Week 4: Breaking and Rebuilding Week 3

  1. asmeurer says:

    I should start telling readers of my blog to skip over my long-windedness. :)

    By the way, I just tell people to click on the “view source” button for too long isympy output (like here).

  2. Pingback: Status Update—Week 4 « nessgrh

  3. Dauda inuwa says:

    Two fair dice are rolled.Find the probability mass function of x and y when x is the value on any die and y is their sum

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s