This blog documents the progress of the Google Summer of Code 2011 project sympy-stats.
This project was completed and now lives in the SymPy codebase. You may read further posts by the principal author at http://matthewrocklin.com/blog
Sympy-stats is an endeavor to imbue SymPy, a symbolic algebra system written in Python, with a random variable type in an effort to create the seeds of a statistical modeling language.
About the project:
SymPy is an open source Computer Algebra System for Python. That is it is a library to model and solve symbolic algebra systems much like Mathematica or Maple. It deals with variables, functions, calculus, linear algebra, geometry, etc…. It does this in purely mathematical terms rather than numerically as is found in projects like numpy and scipy.
SymPy’s treatment of statistics could be improved however. The goal of this project is to clearly define a random variable type and tightly integrate it into SymPy. My hope is that much of statistical modeling can be reconciled with traditional modeling if random variables can be freely interchanged with traditional ones.
About the author:
My name is Matthew Rocklin. I’m a graduate student at the University of Chicago studying computational mathematics. My research focus is uncertainty quantification in complex systems. I’m working on this project because I think it will be of help to myself and other researchers in the field. Coming from a physics background I’m also a completely self-taught (read: bad) coder. I hope that through my participation in this project and connection with the SymPy community my coding practices will improve. In my brief exposure I’ve already found that this is the case.
The original GSoC application for sympy-stats