# Life, Programming, Everything

When I find something interesting and new, I post it here - that's mostly programming, of course, not everything.

## Wednesday, October 01, 2008

### Unlimited Cartesian Product in Java - part 1

You probably know what Cartesian product is. Take two sets, A and B, their product, A x B, is a set of all pairs (a, b), where a belongs to A and b belongs to B. Something like that.

Here I am going to show how to build a Cartesian product for an unlimited collection of sets, or rather for an `Iterable` of `Iterable`s. Unlimited means any number starting from 0. Due to limitations of Java, all these component sets (or rather component `Iterable`s, will have the same element type.

More formally, given a sequence of sets (or of sequences), `A0, A1, ... An`, we will produce `A = A0 x A1 x ... x An`, whose elements are sequences `(a0, a1, ..., an)`, where each `ai` belongs to `Ai`.

So, for instance if we have just `A0` and `A1`, their product will be the following: `{(a0, a1}`.

With this definition, we, with some confusion, have to admit that the Cartesian product of just one `A0` is a set of singletons, `{(a)}` for all `a` in `A`. Of course there is a canonical one-to-one correspondence between A and its "product".

More curious is the question what exactly is the product of 0 of components. In a category with limits it is a terminal object, `1`. But how about JVM? There's no such thing as a terminal object.

No problem. We can use our definition, and take the set of empty sequences. There's just one empty sequence, so our canonical singleton has the following form:
{{}}. Did you notice the underlying type is not even mentioned. We have an `Iterable` of one single empty `iterable` of what type? Actually, Java deduces it from the type of the variable the product is assigned to. That's the safest way to pass the type knowledge into the executing code.

So, the solution I propose is to have something like this:
``    Iterable<Iterable<Integer>> actual = Cartesian.product(Arrays.asList(Arrays.asList(1, 2, 3), Arrays.asList(4, 5, 6)));    assertEquals(9, Set(actual).size());``

Note the funny function name: `Cartesian.product`. Why do we need the special class named `Cartesian`? For two reasons. One, it is a neat way to provide type parameterization; second, calculating the product is not an elementary operation, so we need a bunch of private methods to help us. In JavaScript this can be accomplished by defining functions within our function; in Haskell we could comfortably add to our main definition the auxiliary "`where`" definitions. In Java, the "`where`" become class or instance methods.

So, we are introducing a special class named `Cartesian`, with just one exposed method, `product`:

` public static class Cartesian<X, XS extends Iterable<X>> { ... public static <X, XS extends Iterable<X>> Iterable<Iterable<X>> product(final Iterable<XS> xss) { return new Cartesian<X, XS>().calculateProduct(xss); } ...`

You've noticed the `IT` parameter type. The reason for having this parameter is that we can pass for product calculations all kinds of `Iterable`s of `Iterable`s: `Set`s of `List`s, `List`s of `Set`s... any combination would work.

First, the algorithm. Here's how lj user=_navi_ defines it:
``    product :: [[a]] -> [[a]]    product [] = [[]]    product (xs:xss) = product' xs (product xss)      where        product' xs pxs = concat \$ map (\x -> map (x:) pxs) xs``

In plain words, for us, Java pdestrians, it means this:

Say, we have an `Iterable` consisting of the first component, and the tail. And suppose we already have a product (call it `pxs`) built for the tail.
Then we do the following: scan the first component, `A0` attaching its elements `ai` to the head of each element of the already built product.

We do this by having a function (let's call it `f(a0, as)` that concatenates an element `a0` with the tail `[a1, ..., an]`, producing a new sequence `[a0, ..., an]`.
Having function `f`, we build another function, `g(a)` that builds a list of all `[a0, ..., an]` - by mapping the product of the tail, `pxs` using the function `f`: `g` returns, for each `a0`, a list of concatenations built by `f`.

So, for each element `a` the function `g` produces a list. Now let's apply this to all the elements of the first component, `A0`: that is, transform the component... to what? To a sequence of sequences of sequences.

We do not need such a deep structure. We just need the sequences of the lower level, listed in some order or in no order. So we do what is known as flattening the list.

The important thing is that the product we build should be lazy. As an example, suppose we build all permutations of `N` elements. This is equivalent to building a cartesian product of `[0..N-1] x [0..N-2] x ... x [0..1] x `; there's `N!` elements in the product. We do not want to fill our whole memory with these elements before we start processing them.

(the code is published and commented in the second part)