The backend for randomvariables is completely rewritten.

The goal of the RandomVariable module is to turn Statistical statements like

*P(X>Y given Y<5)*

into computational statements. These computational statements are then handled by external machinery.

There are three planned implementations. Each implementation writes out to a different type of computational statement.

**Discrete/Finite** :: examples – Dice, coins, etc…

These turn statistical statements into generator expressions. Generators work very predictably and so these are always collapsed to produce answers.

**Continuous Univariate** :: examples – Exponential, Normal, Pareto etc…

These produce statements that rely heavily on SymPy Integrals and Deltas. Complex statements often stress the current SymPy machinery beyond what it can handle. A lazy fallback mode is built in to just give you the raw Integral

**Multivariate Random Normal**

These will produce statements in Linear Algebra. Statistical statements are restricted to the linear setting. As long as this is true I suspect these will always compute.

Each type tries to support a few standard operations. This allows for clean mixing of mixed type statements when all types support a common operation. Not all types support all operations. For example “Where” in the continuous case fails for any expression of more than a single variable. Multivariate Normals will likely only support Density and conditional_space

- integrate :: Expr with RV -> Expr without RV
- where :: Conditional(Expr) -> Domain
- conditional_space :: Condition -> New ProbabilitySpace
- compute_density :: Expr (random) -> Expr (density)

If that wasn’t clear that’s just fine – this will all be opaque to the user. This functionality will all be externally visible through the special functions

`P, E, Where, Density.`

- P(condition) :: The probability that a condition is True
- E(expression) :: The expected value of an expression
- Where(condition) :: The possible values the underlying can take on that satisfy the condition
- Density(expression) :: The assignment of probability to possible states of an expression

Ok, so that’s the design. Other than some examples, everything from two weeks ago is up and running under the new system. The major accomplishments for this week is that we can do everything we used to and we’re going to be able to go a lot farther.

Also, we can now compute conditional probabilities and densities. This is a lot of fun. To compute the probability of a condition we type P(condition) such as P(X>3). To compute the probability of a condition *given* that some other condition is true we type P(condition, given) such as P(X>3 , X+Y<5).

For example bayes’ theorm is written as

`P(A, B) == P(B, A) * P(A) / P(B)`

which, conveniently enough, asserts to true for every finite example of A and B I can think of (remember, FiniteRandomVariables almost always compute).

Examples

>>> from sympy.statistics import * >>> X,Y,Z = Die(6).value, Die(6).value, Die(6).value >>> E(X) # Expectation of a fair die 7/2 >>> P(X>3) # Probability that it is above 3 1/2 >>> Where(X>3).as_boolean() # What are the possible values if it is above 3? die₁ = 6 ∨ die₁ = 5 ∨ die₁ = 4 >>> E(X , X>3) # If we know the die will be above 3, what is the expected value? 5 >>> Density(X, X>3) # What is the density in this case? {4: 1/3, 5: 1/3, 6: 1/3} >>> E(X+Y-Z, 2*X>Y+1) # These can be made relatively complex and are still fast 49/12 >>> Density(X+Y , Y+Z>9) {5: 1/36, 6: 1/12, 7: 1/6, 8: 1/6, 9: 1/6, 10: 1/6, 11: 5/36, 12: 1/12} # Lets test Bayes rule >>> A = Eq(X-Y, Z) >>> B = Z>Y >>> P(A, B) == P(B, A) * P(A) / P(B) True

I’ll create a separate post for conditional probabilities with a continuous example in a while. I was able to pose and automatically solve a basic data assimilation problem.

Code is available in the new branch

https://github.com/mrocklin/sympy/tree/rv2/