Matrix Expressions

Linear algebra is important. It is a language which both humans and computers understand well which we can use to describe a large class of important problems. Linear Algebra is an alternative approach to communicate with computers – alternative to writing code.

Matrices are used in a number of contexts and, understandably, SymPy represents them in a few ways. You can represent a matrix or linear operator with a Symbol or you can write out a matrix’s components explicitly with a Matrix object. Thanks to recent work by Sherjil over here, Matrix objects are quickly becoming more powerful.

Recently I’ve wanted to build up purely symbolic matrix expressions using Symbol but kept running into problems because I didn’t want to add things to the SymPy core Expr that were specific to matrices.  The standard SymPy Expr wasn’t really designed with Matrices in mind and I found that this was holding me back a bit.

I decided to branch off a MatrixExpr class that, while much less stable, is open to experimentation. It’s been lots of fun so far. I’ve used it for my GSoC project to build up large expressions using block matrices.

I’ll have examples in a future post related to my GSoC project. For now if you’d like to check it out my code resides here:

There is a MatrixExpr class with associated MatrixSymbol, MatAdd, MatMul, MatPow, Inverse, Transpose, Identity, ZeroMatrix objects. All the things you need for basic expressions. Most of the logic still depends on the subclassed Add, Mul, Pow classes with a little bit added on.

Also, because my GSoC project needed it I built a fun BlockMatrix class that holds MatrixExpr’s and can be freely mixed with normal MatrixExprs in an expression.

Posted in Uncategorized | 1 Comment

Week 7: Not much to report

All quiet here in Random Variable land.

The next big step is to write up a Multivariate Normal Random Variable. Operations on these will generate expressions in Linear Algebra. I’m still working out the best way to integrate this into SymPy.

A bit less exciting (though arguably just as important) I’ve started writing tests and filling in gaps for Discrete and Continuous RVs.

Posted in Uncategorized | Leave a comment

A Lesson in Data Assimilation using SymPy

What is the temperature outside? Hold on, I’ll check…

Ok, I went outside and felt the air temperature. I think that it is 30C but I’m not very good at telling the temperature when it’s humid out; we’ll say that it’s 30C with a standard deviation of 3C. That is, we wouldn’t be surprised if it was 27C or 33C but we’re confident that it’s not 20C or 40C. Rather than represent the temperature as a number like 30, lets represent it with a Normal Random Variable with mean 30 and standard deviation 3.

>>> T = Normal(30, 3)

We represent this pictorally as follows.

Prior Distribution of Temparture

Hopefully you find this representation intuitive. If you’re a math guy you might like the functional form of this:

>>> Density(T)

\begin{pmatrix}x_{1}, & \frac{\sqrt{2} e^{- \frac{1}{18} \left(x_{1} -30\right)^{2}}}{6 \sqrt{\pi}}\end{pmatrix}

You’ll see that as x gets away from the mean, 30, the probability starts to drop off rapidly. The speed of the drop off is moderated by 2 times the square of the standard deviation, 18, present in the denominator of the exponent.

We’ll call this curve the prior distribution of T. Prior to what you might ask? Prior to me going outside with a thermometer to measure the actual temperature.

My thermometer is small and the lines are very close together. This makes it difficult to get a precise measurement. I think I can only measure the temperature to the nearest one or two degrees. We’ll describe this measurement noise by another Normal Random Variable with standard deviation 1.5.

>>> noise = Normal(0, 1.5)
>>> Density(noise)

\frac{\sqrt{2} e^{- \frac{2}{9} x_{2}^{2}}}{3 \sqrt{\pi}}

Before I go outside lets think about the value I might measure on this thermometer. Given our understanding of the temperature we expect to get a value around 30C. This value might vary though both because our original estimate of T might be off (the weather might actually be 33C) and because I won’t measure it correctly because the lines are small. We can describe this value+variability as the sum of two random variables

>>> observation = T + noise

Ok, I just came back from outside. I measured 26C on the thermometer; reasonable but a bit lower than I expected. We remember that this measurement has some uncertainty attached to it due to the noise and represent it with a random variable rather than just a number

>>> data = 26 + noise

\frac{\sqrt{2} e^{- \frac{2}{9} \left(z -26\right)^{2}}}{3\sqrt{\pi}}

Note how the data curve looks skinnier. This is because its more precise around the mean value, 26. It’s taller so that the area under the curve is equal to one.

After we make this measurement how should our estimate of the temperature change? We have the original estimate, 30C +- 3C, and the new measurement 26C +- 1.5C. We could take the thermometer’s reading because it is more precise but this doesn’t use the original information at all. The old measurement still has useful information and it would be best to cleanly assimilate the new bit of data (26C+-1.5C) into our prior understanding (30C +-3C).

This is the problem of Data Assimilation. We want to assimilate a new meaurement (data) into our previous understanding (prior) to form a new and better-informed understanding (posterior). Really, we want to compute

<<T_new =  T_old given that observation == 26  >>

We can do this in SymPy as follows

>>> T_new = Given( T , Eq(observation, 26) )

This posterior is represented below both visually and mathematically

\frac{e^{- \frac{2}{9} \left(- x_{1} + 26\right)^{2}} e^{- \frac{1}{18} \left(x_{1} -30\right)^{2}}}{9 \pi}

The equation tells us that the temperature, here represented by x_1, drops off both as it gets away from the prior mean 30C and as it gets away from the measured value 26C. The value with maximum likelihood is somewhere between. Visually it looks like the blue curve is described by  27C +- 1.3C or so.

We notice that the posterior is a judicious compromise between the two, weighting the data more heavily because it was more precise. The astute statistician might notice that the variance of the posterior is lower than either of the other two (the blue curve is skinnier and so varies less). That is, by combining both measurements we were able to reduce uncertainty below the best of either of them. It’s worth noting that this solution wasn’t built into SymPy-stats or into some standard probabilistic trick. This is the answer given the most basic and fundamental rules.

The code for this demo is available here. It depends this statistics branch of SymPy and on matplotlib for the plotting .

Data assimilation is a very active research topic these days. Common applications include weather prediciton, automated aircraft navigation, and phone-gps navigation.

Phone/GPS navigation is a great example. Your phone has three ways of telling where it is; cell towers, GPS satellites, and the accelerometer. Cell towers are reliable but low precision.  GPS satellites are precise to a few meters but need some help to find out generally where they are and update relatively infrequently. The accelerometer has fantastic, real-time precision but is incapable of large scale location detection. Merging all three types of information on-the-fly creates a fantastic product that magically tells us to “turn left here” at just the right time.

Posted in Uncategorized | 9 Comments

Week 6: A new backend for Random Variables

The backend for randomvariables is completely rewritten.

The goal of the RandomVariable module is to turn Statistical statements like
P(X>Y given Y<5)
into computational statements. These computational statements are then handled by external machinery.

There are three planned implementations. Each implementation writes out to a different type of computational statement.

Discrete/Finite :: examples – Dice, coins, etc…

These turn statistical statements into generator expressions. Generators work very predictably and so these are always collapsed to produce answers.

Continuous Univariate :: examples – Exponential, Normal, Pareto etc…

These produce statements that rely heavily on SymPy Integrals and Deltas. Complex statements often stress the current SymPy machinery beyond what it can handle. A lazy fallback mode is built in to just give you the raw Integral

Multivariate Random Normal

These will produce statements in Linear Algebra. Statistical statements are restricted to the linear setting. As long as this is true I suspect these will always compute.

Each type tries to support a few standard operations. This allows for clean mixing of mixed type statements when all types support a common operation. Not all types support all operations. For example “Where” in the continuous case fails for any expression of more than a single variable. Multivariate Normals will likely only support Density and conditional_space

  • integrate :: Expr with RV -> Expr without RV
  • where :: Conditional(Expr) -> Domain
  • conditional_space :: Condition -> New ProbabilitySpace
  • compute_density :: Expr (random) -> Expr (density)

If that wasn’t clear that’s just fine – this will all be opaque to the user. This functionality will all be externally visible through the special functions
P, E, Where, Density.

  • P(condition) :: The probability that a condition is True
  • E(expression) :: The expected value of an expression
  • Where(condition) :: The possible values the underlying can take on that satisfy the condition
  • Density(expression) :: The assignment of probability to possible states of an expression

Ok, so that’s the design. Other than some examples, everything from two weeks ago is up and running under the new system. The major accomplishments for this week is that we can do everything we used to and we’re going to be able to go a lot farther.

Also, we can now compute conditional probabilities and densities. This is a lot of fun. To compute the probability of a condition we type P(condition) such as P(X>3). To compute the probability of a condition given that some other condition is true we type P(condition, given) such as P(X>3 , X+Y<5).

For example bayes’ theorm is written as
P(A, B) == P(B, A) * P(A) / P(B)
which, conveniently enough, asserts to true for every finite example of A and B I can think of (remember, FiniteRandomVariables almost always compute).


>>> from sympy.statistics import *

>>> X,Y,Z = Die(6).value, Die(6).value, Die(6).value
>>> E(X) # Expectation of a fair die
>>> P(X>3) # Probability that it is above 3
>>> Where(X>3).as_boolean() # What are the possible values if it is above 3?
die₁ = 6 ∨ die₁ = 5 ∨ die₁ = 4
>>> E(X , X>3) # If we know the die will be above 3, what is the expected value?
>>> Density(X, X>3) # What is the density in this case?
{4: 1/3, 5: 1/3, 6: 1/3}

>>> E(X+Y-Z, 2*X>Y+1) # These can be made relatively complex and are still fast
>>> Density(X+Y , Y+Z>9)
{5: 1/36, 6: 1/12, 7: 1/6, 8: 1/6, 9: 1/6, 10: 1/6, 11: 5/36, 12: 1/12}

# Lets test Bayes rule
>>> A = Eq(X-Y, Z)
>>> B = Z>Y
>>> P(A, B) == P(B, A) * P(A) / P(B)

I’ll create a separate post for conditional probabilities with a continuous example in a while. I was able to pose and automatically solve a basic data assimilation problem.

Code is available in the new branch

Posted in Uncategorized | Leave a comment

Week 5: Reorganizing thoughts

I spent the week at the SAMSI workshop for Uncertainty Quantification. Random variables were active everywhere, assimilating data, representing uncertainty in climate models, being represented by samples, polynomials, gradients, etc…. For me there has been a lot of thinking but no new code.

I would like to be able to do two things

  1. Compute distributions of expressions like f(X), given that some other condition, such as g(X+Y)>z, is true.
  2. Do the above (and all other functionality), on multivariate random variables

Issue (1) is done in the discrete case but fails on any complex continuous expression. It’s challenging to represent events like (X+Y==z). The PDF includes delta functions and isn’t easily representable using the current SymPy-Set backend.

Issue (2) is challenging because the symbols in the equations will have multi-dimensional domains. This isn’t so much complex as it is annoying to build onto the current system.

In general the code isn’t set up to handle joint distributions over several variables elegantly. Both of these issues are things I’d like in my research, and neither of them are  simple to implement on top of the given system. I’m considering rewriting the backend to define events using conditionals rather than sets. This is daunting (lots of work) but exciting (potentially lots of new results).

Posted in Uncategorized | Leave a comment

Week 4: Breaking and Rebuilding Week 3

Last week we created random expressions within SymPy by including random symbols within sympy expressions. We discussed some simple examples like the addition of two dice or two normal random variables. I expressed at the end of the post that the code felt fragile, not sufficiently general to handle complex tasks.

This week I focused on breaking the code with complex tasks and building up the places where it was weak. It’s been a frustrating yet satisfying time with some fun examples to share below. If I were you I’d skip the following long-winded text and head straight to the code/math below.

I started the week thinking about

  • PDFs of Continuous v. Discrete variables and what to do in the mixed case
  • Building in special cases like normal random variables
  • Creating an extra layer of abstraction in my code. It was getting ugly

Unifying PDFs for continuous and discrete cases turns out to be a bad idea (I think). They have two distinct mathematical meanings. Now we have PDF and PMF (for probability mass function) functions. Calling PDF on a purely discrete expression will raise an error. In an expression with both a continuous and discrete random variable the continuous PDF makes more sense and takes over.

I wanted to build in special cases to sympy.stats because my integrals on Gaussians were taking forever and often failing (as you can see in the convolution example below). Integrating over Gaussians is a fairly common operation for me and is something that should be fast, has easy-to-write-down rules, and is hell on the SymPy integrator. I always shy away from coding in special cases though, it tends to muck things up. To my great relief I found that Tom Bachmann, another GSoC’er is doing some great work over here on a more powerful definite integration engine. I’m hopeful that his work will make my work better and I’m holding off on special cases for now.

Computing PDF’s of complex (algebraically) mixed (continuous and discrete) expressions is hard, both for humans and computers. I was computing probabilities and expectations using the PDF as my primary operation and this was often failing. I’ve now basing lots of computations on marginalization which is more robust but provides less information (which for things like expectations is perfect). It’s easy to define for all types of random variables independently.

I’ve also created a syntactically nice way to create events from random variables and relationals. Now you can type things like eventify(die1 > 3) to get the Event object die1 in {4, 5, 6} or just type P(die1>3) and get out 1/2. Ideally I’d like to turn something like this into a stronger syntax so that for example you could compute the joint pdf using “Givens” and conditionals P(X,Y) == P(X|Y==y)P(Y)

And finally! Some code examples. The equations look great in isympy but are too large to copy/paste. You’re going to get output from the latex printer instead. It’s interesting to note that I’ve only coded in the pdfs to the basic distributions, nothing else. There are no special rules here, no expectations are known explicitly and sympy doesn’t know things like “variances add”. SymPy.statistics is generating equations and integrals and SymPy is solving them.

>>> from sympy.statistics import *

>>> alpha = Symbol('alpha', real=True, positive=True, finite=True)
>>> beta = Symbol('beta', real=True, positive=True, finite=True)
>>> sigma = Symbol('sigma', real=True, finite=True, bounded=True, positive=True)
>>> rate = Symbol('lambda', positive=True, real=True, finite=True)

>>> X = NormalProbabilitySpace(0,sigma).value
>>> Y = ExponentialProbabilitySpace(rate).value

>>> PDF(Y) : \begin{pmatrix}y, & \lambda e^{-y \lambda}\end{pmatrix}
>>> E(Y) : \frac{1}{\lambda}
>>> var(Y) : \lambda^{-2}
>>> skewness(Y) : 2

>>> PDF(X) : \begin{pmatrix}y, & \frac{\sqrt{2} e^{\frac{-y^{2}}{2 \sigma^{2}}}}{2 \sqrt{\pi} \sigma }\end{pmatrix}
>>> PDF(alpha*X + beta) : \begin{pmatrix}y, & \frac{\sqrt{2}e^{\frac{-\left(y - \beta\right)^{2}}{2 \alpha^{2} \sigma^{2}}}}{2 \sqrt{\pi} \alpha \sigma }\end{pmatrix}
>>> E(X) : 0
>>> E(alpha*X + beta) : \beta
>>> std(X) : \sigma
>>> simplify(std(alpha*X + beta)) : \alpha \sigma
>>> skewness(alpha*X + beta) : 0

In the following example it correctly determines that a convolution is necessary but the integration engine can’t handle this (yet?).
>>> PDF(X+Y) : \begin{pmatrix}y, & \int_{0}^{\infty} \frac{\sqrt{2} \lambda e^{-x \lambda} e^{\frac{-\left(- x + y\right)^{2}}{2 \sigma^{2}}}}{2 \sqrt{\pi} \sigma }\,dx\end{pmatrix}

These still work, even though PDF fails
>>> E(X+Y) : \frac{1}{\lambda}
>>> simplify(var(X+Y)) : \frac{\lambda^{2} \sigma^{2} + 1}{\lambda^{2}}
>>> skewness(X+Y) : - 2 \frac{\sqrt{2} \sqrt{\pi} \sigma}{\lambda \left(\lambda^{2} \sigma^{2} + 1\right)^{\frac{3}{2}}}  : Can anyone verify that this is correct?

>>> D1,D2,D3 = Die().value, Die().value, Die().value

>>> eventify(D1>3)
    die₁ in {4, 5, 6}

>>> eventify(D1+D2>10)
    (die₂, die₁) in ({6} x {6}) ∪ ({6} x {5}) ∪ ({5} x {6})

>>> eventify(And(D1>3, D1+D2<6))
    (die₁, die₂) in {4} x {1}

>>> P(D1>D2)

# This works with continuous variables too
>>> eventify(And(X>-1, X<1))
    x in (-1, 1)

# Mixing is just fine
>>> P(And(X>0, D1>3))

>>> P(X**2<1) : \textrm{erf}\left(\frac{\sqrt{2}}{2 \sigma}\right)

>>> PDF(X+D1) : \begin{pmatrix}y, & \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -1\right)^{2}}{2 \sigma^{2}}}} + \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -2\right)^{2}}{2 \sigma^{2}}}} + \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -3\right)^{2}}{2 \sigma^{2}}}} + \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -4\right)^{2}}{2 \sigma^{2}}}} + \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -5\right)^{2}}{2 \sigma^{2}}}} + \frac{\sqrt{2}}{12 \sqrt{\pi} \sigma e^{\frac{\left(y -6\right)^{2}}{2 \sigma^{2}}}}\end{pmatrix}

This example is silly. It’s the convolution of a die roll with a Gaussian. Neat to see complex problems work out though!

Plan for next week:
This last week I didn’t intend to write any new functionality and ended up with much more functional code. The new random variable stuff is completely undocumented and only sparsely tested. There is still more work to be done without a specific goal in mind.

Posted in Uncategorized | 4 Comments

Week 3 – Random Variables

This week sees the first implementation of random variables. Random variables are SymPy symbols which carry around ProbabilitySpace information. In expressions they interact just like normal SymPy symbols until special functions (like PDF or expectation_value) are called, at which point they take on special meaning.

As of this morning there is a rudimentary implementation and some mechanics written down. This feels more like experimental code than a polished library however. Still, it’s very satisfying to see things work. Some examples with finite probability spaces

>>> d1,d2 = Die(6), Die(6)
>>> die1, die2 = d1.value, d2.value # Some Random Variables

# RV's act just like normal SymPy symbols in expressions
>>> 2*die1 + die2 - die1
die₁ + die₂

# But some functions, like Probability Density Function, can watch out for them
>>> PDF(die1)
{1: 1/6, 2: 1/6, 3: 1/6, 4: 1/6, 5: 1/6, 6: 1/6}

>>> PDF(die1 + die2)
{2: 1/36, 3: 1/18, 4: 1/12, 5: 1/9, 6: 5/36, 7: 1/6, 8: 5/36, 9: 1/9, 10: 1/12, 11: 1/18, 12: 1/36}

# Once we have the PDF other operations are easy to define
>>> E(die1+die2) # Expectation Value
>>> var(die1) # Variance
>>> covar(die1, die2) # Covariance

# Complicated expressions are also doable
>>> PDF( Eq(cos(pi*die1), 1) )
{False: 1/2, True: 1/2}

Starting from the work on probability spaces and events, this work on discrete random variables was actually fairly easy. Computing a PDF of an expression can be done by looking at “The measure of all atomic elements within the sample space of the cartesian product of the individual probability spaces.” There is some meat in that last sentence, luckily it’s all been done before. The challenge behind random variables was how to connect those mechanics to a simple language that interacts well in SymPy.

Throughout this project I’ve been treating the discrete and continuous worlds separately while making sure to provide easy ways to glue them together. This is still the intention with Random Variables but I have yet to find the glue.

Performing computations on Continuous Random Variables is tricker than in the Discrete case. In the discrete case you simply go through and tabulate every possibility much like this haskell library. The continuous case on the other hand requires some calculus and automatic solution of complicated expressions. This is one of the reasons why it’s so nice putting this project inside SymPy – much of the legwork is already done by other smarter folks. Still, it’s challenging to find the correct tools in SymPy and to decide how to troubleshoot when the expressions that I generate create errors downstream.

For the continuous case I have a first pass at a method which solves (in theory) general expressions on (multiple) random variables given some assumptions about the underlying PDF. It gives correct answers to the expressions I can cook up however sometimes those answers are not ideally simplified. By tweaking various simplification settings I can get the following results on normal random variables ( a difficult yet very common case )

>>> x,y = Symbol('x', real=True), Symbol('y', real=True)
>>> pdfx, pdfy = exp(-x**2/2)/sqrt(2*pi), exp(-y**2/2)/sqrt(2*pi)
>>> C,D = ContinuousProbabilitySpace(x, pdfx), ContinuousProbabilitySpace(y, pdfy)
>>> X,Y = C.value, D.value # Some Normal Random Variables

>>> PDF(X) # Grab out the PDF

>>> PDF(2*X) # Note that the standard deviation doubles and the expression remains normalized

>>> PDF(X+Y)
-exp(-_x**2/4)*erf(-_x/2 - oo)/(4*pi**(1/2)) + exp(-_x**2/4)*erf(-_x/2 + oo)/(4*pi**(1/2))
# If we note that erf(oo)==1 and erf(-oo)==-1 then we have
exp(-_x**2/4)/(2*pi**(1/2)) # Note that the variances added

>>> PDF(cos(pi*X)) # More complicated expressions are also doable
-2**(1/2)*exp(-acos(_x)**2/(2*pi**2))/(2*pi**(3/2)*(-_x**2 + 1)**(1/2))

So the mechanics work (at least for these cases) and that’s satisfying. However this is the first time in this project that I’ve written code without an eye to it being extensible. It’s fun to see these results but they’re not yet indicative of a real solution. My goal for next week is to not be tempted to develop any new functionality and instead to work on organization.

Posted in Uncategorized | 2 Comments