When I find something interesting and new, I post it here - that's mostly programming, of course, not everything.

Tuesday, June 23, 2009

Why Monoids?

I sincerely plan to explain in simple terms what's all this fuss about monoids, why we suddenly need these primitive structures, and how to make them available in Java.

A monoid is one of the simplest notions of algebra. Slightly oversimplifying, it consists of elements that have a multiplication operation a*b defined on them, and a neutral element e, such that e*a == a*e. Multiplication should be associative (a*b)*c == a*(b*c), but not necessarily commutative (a*b != b*a). If you can't imagine non-commutative multiplication, think of matrices... or of spatial rotations. The order counts.

A simpler example would consist of integer numbers and addition playing the role of multiplication. 0 is the neutral element with respect to addition, so there.

Now why would we need this structure? It turned out that if you have to add up a billion, or a quadrillion of integers, you may be interested in doing it by distributing the operation over several (or ~700 000, in the case of Google) machines. Adding up a quadrillion of integers may be not that interesting; but multiplying a billion of matrices is a different, and more challenging, story.

And why can we distribute it among the thousands of computers? We can do it all thanks to the associativity. (m1*m2*...*mk1*mk1+1*mk1+2..*mk2*...*mn) is the same as
(m1*m2*...*mk1)*(mk1+1*mk1+2..*mk2)*(...*mn), and we can calculate intermediate results (shown in parentheses here) on different machines, then aggregating them together; we can do it in cascade, whatever we choose. The strategy does not influence the result.



The following Java class defines, or rather declares the necessary functionality:
public abstract class Monoid<X>
{
abstract X unit();
abstract X m(X x, X y);

public X fold(Iterable<X> source) {
X result = unit();
for (X x : source) {
result = m(result, x);
}
return result;
}

public X fold(X[] source) {
X result = unit();
for (X x : source) {
result = m(result, x);
}
return result;
}
}
We can use it to add numbers, strings, matrices; to multiply numbers or matrices; to concatenate lists and join sets.
Monoid<Set<T>> setMonoid = new Monoid<Set<T>> {
public Set<T> unit() {
return Set<T>.EMPTY;
}

public Set<T> m(Set<T> a, Set<T> b) {
Set<T> c = a.clone();
c.addAll(b);
return c;
}
Of course using all this for the data that we have in memory of one single machine. But, as I said before, we can actually delegate the monoid's fold operation to the computing cloud, and have the list deployed somewhere in the big file system, and let the cloud partition it. The result will be the same; and all this due to the monoidal properties of our concrete classes and operations.

4 comments:

Simon Hawkin said...

When you say that multiplication should not be commutative, you really mean it does not have to be commutative, right? Integers with addition are commutative.

Also, you used fold() as an important name but did not describe it in the text (only mentioned once).

There must be a good way to format code on blogger. Too bad it does not understand <code> or <pre> (or even <tt>).

dmitry.matsnev said...

For monoids, the neutral element e should also enjoy a*e=a.

Vlad Patryshev said...

Simon, you are right; I'll have to rephrase it.

Simon Hawkin said...

Regarding code formatting: yes, <code> and <pre> are accepted, but only in postings, not in comments.

Followers

Subscribe To My Podcast

whos.amung.us