Simplifying Sets

SymPy’s sets module is a pleasure to work on. The math is approachable well structured. There are basic sets (Intervals, FiniteSets) compound sets (Unions, Intersections, Cartesian Products) and operations (contains, complement, measure, subset). Because the problem is easy to understand and intrinsically simple, sets is a great project to practice coding. Can we write code that is as simple as the problem we’re solving?

Historically I have been bad at this. I am guilty of writing needlessly complex code. A friend recently sent me a talk by Rich Hickey, the creator of Clojure, about simplicity versus ease. I decided to try to make the SymPy.Sets code simpler as an educational project.

The current issue with sets is that many classes contain code to interact with every other type of class. I.e. we have code that looks like this:

def operation(self, other):
    if other.is_FiniteSet:
    if other.is_Interval:
    if other.is_ProductSet:

This is because the rules to, say join the FiniteSet {1,2,3,4} with the Interval [2, 3) can be complex. The sets module handles this all marvelously well and produces [2, 3] U {1, 4}, a nice answer. The code to do it however is atrocious and filled with nests of rules and special cases. Much of this code is in the Union and RealUnion classes but some of it is in FiniteSet, some of it is in Interval as well. Everything works, it’s just complex.

This is similar to the situation in Mul.flatten and friends.

So what is the solution for Sets? How do we simplify Union and Intersection?

First, lets acknowledge that Union/Intersection serve two purposes

  1. They serve as a container of sets
  2. They simplify these sets using known rules

We separate these two aspects and solve them independently.

We separate these two in the same way Mul and Add handle it. We create a reduce/flatten method and, while we call it by default, it is now separate from the construction logic. There has been talk about separating these two parts of our container classes even further by having container classes that only contain and simplifyers/canonicalizers that only simplify/canonicalize.

We need a simple way to manage all of the special rules we know for simplifying collections of sets. The issue is that there are a lot of special cases; FiniteSets can do some things, Intervals others, and how do we anticipate not-yet-defined sets? Our solution is as follows.

Every set class has methods _union(self, other) and _intersect(self, other). These methods contain local simplification rules. I.e. if self knows how to interact with other it returns a new, simplified set, otherwise it returns None for “I don’t know what to do in this situation”. For example Intervals know how to intersect themselves with other Intervals but they don’t know how to interact with FiniteSets, luckily FiniteSets know how to do this. Together they know how to handle any situation between them.

Here are the local interaction methods for EmptySet.

def _union(self, other):
    return other
def _intersect(self, other):
    return S.EmptySet

These are particularly simple, are known only by EmptySet, and yet produce proper behavior in any interaction. When we add EmptySet to the family of Sets we don’t need to add code to Union or Intersection. Everything is nicely contained.

When they simplify, the Union and Intersection classes do two things.

  1. They walk over the collection of sets and use local rules to perform simplifications
  2. They also contain a few “global rules” that can accelerate the process by looking at the entire collection of sets at once.

In this way it is very easy to extend the Sets module with new classes without breaking Union and Intersection. Additionally, the old nest of code has been cleanly separated and placed into the relevant classes. Unions and Intersections no longer need to know every possible interaction between every possible Set. Instead they manage interactions and let Sets simplify themselves.

A final note. I like this idea of managing many small simplification rules. I stole this idea from Theano, a symbolic/numeric python library. They go one step further though and separate the rule from the container class. I.e. rather than telling Intervals how to interact with Intervals they make a separate rule and include it in some separate simplifying manager. If this idea interests you I suggest you look at their documentation on optimizations.


About mrocklin

PhD student studying Computational Mathematics at the University of Chicago.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s