You probably know what Cartesian product is. Take two sets, A and B, their product, A x B, is a set of all pairs (a, b), where a belongs to A and b belongs to B. Something like that.
Here I am going to show how to build a Cartesian product for an unlimited collection of sets, or rather for an
Iterable
of
Iterable
s. Unlimited means any number starting from 0. Due to limitations of Java, all these component sets (or rather component
Iterable
s, will have the same element type.
More formally, given a sequence of sets (or of sequences),
A0, A1, ... An
, we will produce
A = A0 x A1 x ... x An
, whose elements are sequences
(a0, a1, ..., an)
, where each
ai
belongs to
Ai
.
So, for instance if we have just
A0
and
A1
, their product will be the following:
{(a0, a1}
.
With this definition, we, with some confusion, have to admit that the Cartesian product of just one
A0
is a set of singletons,
{(a)}
for all
a
in
A
. Of course there is a canonical one-to-one correspondence between A and its "product".
More curious is the question what exactly is the product of 0 of components. In a category with limits it is a terminal object,
1
. But how about JVM? There's no such thing as a terminal object.
No problem. We can use our definition, and take the set of empty sequences. There's just one empty sequence, so our
canonical singleton has the following form:
{{}}. Did you notice the underlying type is not even mentioned. We have an
Iterable
of one single empty
iterable
of what type? Actually, Java deduces it from the type of the variable the product is assigned to. That's the safest way to pass the type knowledge into the executing code.
So, the solution I propose is to have something like this:
Iterable<Iterable<Integer>> actual = Cartesian.product(Arrays.asList(Arrays.asList(1, 2, 3), Arrays.asList(4, 5, 6)));
assertEquals(9, Set(actual).size());
Note the funny function name:
Cartesian.product
. Why do we need the special class named
Cartesian
? For two reasons. One, it is a neat way to provide type parameterization; second, calculating the product is not an elementary operation, so we need a bunch of private methods to help us. In JavaScript this can be accomplished by defining functions within our function; in Haskell we could comfortably add to our main definition the auxiliary "
where
" definitions. In Java, the "
where
" become class or instance methods.
So, we are introducing a special class named
Cartesian
, with just one exposed method,
product
:
public static class Cartesian<X, XS extends Iterable<X>> {
...
public static <X, XS extends Iterable<X>> Iterable<Iterable<X>> product(final Iterable<XS> xss) {
return new Cartesian<X, XS>().calculateProduct(xss);
}
...
You've noticed the
IT
parameter type. The reason for having this parameter is that we can pass for product calculations all kinds of
Iterable
s of
Iterable
s:
Set
s of
List
s,
List
s of
Set
s... any combination would work.
First, the algorithm. Here's how
lj user=_navi_ defines it:
product :: [[a]] -> [[a]]
product [] = [[]]
product (xs:xss) = product' xs (product xss)
where
product' xs pxs = concat $ map (\x -> map (x:) pxs) xs
In plain words, for us, Java pdestrians, it means this:
Say, we have an
Iterable
consisting of the first component, and the tail. And suppose we already have a product (call it
pxs
) built for the tail.
Then we do the following: scan the first component,
A0
attaching its elements
ai
to the head of each element of the already built product.
We do this by having a function (let's call it
f(a0, as)
that concatenates an element
a0
with the tail
[a1, ..., an]
, producing a new sequence
[a0, ..., an]
.
Having function
f
, we build another function,
g(a)
that builds a list of all
[a0, ..., an]
- by mapping the product of the tail,
pxs
using the function
f
:
g
returns, for each
a0
, a list of concatenations built by
f
.
So, for each element
a
the function
g
produces a list. Now let's apply this to all the elements of the first component,
A0
: that is, transform the component... to what? To a sequence of sequences of sequences.
We do not need such a deep structure. We just need the sequences of the lower level, listed in some order or in no order. So we do what is known as flattening the list.
The important thing is that the product we build should be lazy. As an example, suppose we build all permutations of
N
elements. This is equivalent to building a cartesian product of
[0..N-1] x [0..N-2] x ... x [0..1] x [0]
; there's
N!
elements in the product. We do not want to fill our whole memory with these elements before we start processing them.
(the code is published and commented in the second part)