Life, Programming, Everything: 2008

Thursday, December 25, 2008

Inner Class in Java: Unexpected Volatility

You probably know that if you use a value in an inner class, the evil Java compiler wants you to write "final"; people do all kinds of trick to bypass it - store date in arrays, for instance.

But relax, you do not have to write "final" if it is a member field. Look at this code:

package experimental;

import java.util.AbstractSet;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;

public class VolatileConstant {

  public static void main(String[] args) {
    Iterable<Set<String>> iss = new Iterable<Set<String>>() {

      public Iterator<Set<String>> iterator() {
        return new Iterator<Set<String>>() {
          int i = 0;
          public boolean hasNext() {
            return i < 3;
          }

          public Set<String> next() {
            i++;

            return new AbstractSet<String>() {
              public Iterator<String> iterator() {
                return new Iterator<String>() {
                  int j = 0;
                  public boolean hasNext() {
                    return j < i;
                  }

                  public String next() {
                    j++;
                    return "" + i + "." + j;
                  }

                  public void remove() {
                    throw new UnsupportedOperationException();
                  }
                };
              }

              public int size() {
                return i;
              }
            };
          }

          public void remove() {
            throw new UnsupportedOperationException();
          }
        };
      }
    };

    Set<Set<String>> sss = new HashSet<Set<String>>();

    for (Set<String> ss : iss) {
      sss.add(ss);
      System.out.println(ss);
    }
    System.out.println(sss);
  }
}

You know what it prints? It's in the next line, in white color so that you have a choice not to see it.


[1.1]
[2.1, 2.2]
[3.1, 3.2, 3.3]
[[3.1, 3.2, 3.3], [3.1, 3.2, 3.3], [3.1, 3.2, 3.3]]

I am sure you can explain it, after a while, but is not it really amusing?

Sunday, December 14, 2008

categories: new part posted

Just posted a new part, "Category of Java Sets", in Easy Categories For Programmers.

Saturday, December 13, 2008

Functions

An Excuse

Well, I decided to gather all the code related to Function class in one "universe" container class, Functions. The other option would be to have a separate "package". I do not see any reason for using packages instead of container classes.

My idea regarding container classes is that it is a theory - we have constants, types, rules, axioms, all in one place. An element of theory is meaningless outside of its theory. A set does not make sense without a Set Theory.

So here we have some kind of "Functions Theory". Since functions are defined on types, which vaguely are like classes, which vaguely are like unlimited sets, we can define set-theory-like notions of Injection and Bijection. A Surjection would be nice to have too, but it has no functional influence on current programming practice: nobody knowingly builds factorsets.

Overview

Three abstract classes are defined:

Function - general functionality;
Injection - a special kind of Function, not mapping different values to one;
Bijection - invertible Function

and several useful extra classes:

Inclusion, - a Function that includes one set into another;
Id - an identity Function,
PairsToMap - takes a set of pairs, turns it into a Map;
IterableToSet - takes an iterable, turns it into a Set.

Also there are several static methods. Don't be afraid of static methods, they all belong to Functions Theory, and there's hardly a reason to override or polymorph them in any way.

Function

A Function<X, Y> declares an abstract method Y apply(X x), which obviously determines the function's behavior. In addition, Function has methods for mapping an Iterator, an Iterable, a Collection, a List. Note that there's no method that maps a Set. Why? Because if you apply a Function to a Collection, you can have repeating values.

Method toMap(Set<X) converts a Function into a Map; method forMap(Map<X, Y>) turns a Map into a Function. Static compose(f, g) method composes two Functions.

Injection

Injection is a special kind of Function: it maps different arguments to different values. Not that we can verify it; we just trust the declaration. Since a composition of two Injections is an Injection, this case is taken care of. It is.

An Injection maps a Set to a Set, so this method is there.

Inclusion is a special kind of Injection: it includes one class (set) into its superclass (superset). It maps a value to itself, casting.

Bijection

Bijection<X, Y> is a special kind of Injection: it has an inverse. So, additionally, it declares a method X unapply(Y). A composition of two Bijections is a Bijection.

Here is a sample code, where Schwartzian transform is defined:

/**
 * Given a function f, builds another function that for each x
 * builds Map.Entry(x, f(x))
 *
 * @param f function to apply
 * @return a function that builds pairs.
 */
public static <X, Y> Bijection<X, Pair<X, Y>> schwartzianTransform(final Function<X, Y> f) {
  return new Bijection<X, Pair<X, Y>>() {
    public Pair<X, Y> apply(final X x) {
      return math.cat.LazyPair.LazyPair(x, f);
    }

    public X unapply(Pair<X, Y> p) {
      return p.x();
    }
  };
}

Oh, and the code is here.

Thursday, December 11, 2008

Java Solutions: inheriting chain calls

Recently chain builder calls in Java have been gaining popularity, look, e.g. at this.

The idea is that you have a builder that has small configuration methods and in the end builds the right stuff. StringBuffer works like this:

 return new StringBuffer().append("Now is ").append(new Date()).append(", and ticking...").toString();

Or, more typically, something like this code:


 new TestBuilder().limitTime(1000).forPackage(this.getClass().getPackage()).skipFlaky().build().run();

This way we, first, annotate our parameters, and second, bypass the boring law that demands writing each statement on a separate line.
And of course it makes the process more flexible.

Actually, the fact that it is a builder is unimportant. We can just chain calls if every method that in the previous life was returning void would start returning this.

But there's a problem here... what if our builder is not the ultimate version, but a subclass of a superbuilder? Look at an example below:


public class ChainCalls {
 public static class A {
   A do1(String s) {
     System.out.println("A.1 " + s);
     return this;
   }

   A do2(String s) {
     System.out.println("A.2 " + s);
     return this;
   }
 }

 public static class B extends A {
   B do0(String s) {
     System.out.println("B.0 " + s);
     return this;
   }

   B do1(String s) {
     System.out.println("B.1 " + s);
     return this;
   }

   B do3(String s) {
     System.out.println("B.3 " + s);
     return this;
   }
 }

 public static void main(String[] args) {
   ((B)(new B().do0("zero").do1("one").do2("two"))).do3("three");
 }
}

See how we have to cast the result of do2()? That's because the method of class A has no knowledge that it actually works in the context of class B.

How can we deal with this? One cheap solution is to duplicate all the methods in B and call the delegate, that is, super.do1() etc. Feasible, but not very nice, not very scalable, and not very Java5-ish.

Because we have generics, and we have a so-called covariant inheritance, so that - because we can!

An anonymous contributor in livejournal.com has suggested the following solution:

public class ChainCallsCovariant {
 public static abstract class SuperA<T> {
   abstract T self();

   T do1(String s) {
     System.out.println("A.1 " + s);
     return self();
   }

   T do2(String s) {
     System.out.println("A.2 " + s);
     return self();
   }
 }

 public static class A extends SuperA<A> {
   A self() { return this; }
 }

 public static abstract class SuperB<T> extends SuperA<T> {
   T do0(String s) {
     System.out.println("B.0 " + s);
     return self();
   }

   T do1(String s) {
     System.out.println("B.1 " + s);
     return self();
   }

   T do3(String s) {
     System.out.println("B.3 " + s);
     return self();
   }
 }

 public static class B extends SuperB<B> {
   B self() { return this; }
 }

 public static void main(String[] args) {
   new B().do0("zero").do1("one").do2("two").do3("three");
 }
}

The main trick here is to encapsulate "return this", and make it generic, so that we always, in the eyes of the compiler, return the instance of the right class.

P.S. Here I've posted a better solution.

Thursday, December 04, 2008

Java Compiler Warnings 2

Say, somewhere in your test code, you need a list of nulls:

List<Direction> directionsWithNullValues = Lists.newArrayList(null, null, null);

Ouch! Compiler warning: unchecked generic array creation of type E[] for varargs parameter

What's going on here? An array can be created, but Java is confused regarding the type. Just cannot deduce it.

The solution is simple (although awkward):

List<Direction> directionsWithNullValues = Lists.newArrayList((Direction)null, null, null);

Java Compiler Warnings 1

Since out of the 5 million Java programmers there's hardly 100 that actually know Java (I do not count myself in that holy hundred), we probably have to join our efforts (like people do at javaranch and help each other.

Here I'm going to post typical Java Compiler warnings that I encounter and fix.

So this chain letter is something like "effective java for dummies", so to say.

1. Creating generic arrays.

Out of your best intentions, you decide that instead of manually concatenating your collections, you'll have something like this:

Collection<T> concat(Collection<T>... collections) {
  Collection<T> result = new ArrayList<T>();
  for (Collection<T> collection : collections) {
    result.addAll(collection):
  }
  return result;
}

Let's forget the dubious decision of making a live arraylist out of collections of unknown nature, instead of just having a lazy collection or iterable. What's the problem here? Will our compiler complain? No. But if we try to use it, like this:

Collection<Integer> newCollection(oldCollection1, oldCollection2);

we will get a compiler warning: "generic array creation" warning. What's wrong? Here's what. When you pass around a vararg list of parameters, implicitly an array is being created. But Java does not like creating arrays of elements which type is generic. Hence the warning.

To avoid, we can just write a simpler version:

Collection<T> concat(Collection<T> collection1, Collection<T> collection2) {
  Collection<T> result = new ArrayList<T>(collection1);
  result.addAll(collection2):
  return result;
}
...
Collection<Integer> newCollection(oldCollection1, oldCollection2);

Not that I neither condone nor discuss the idea of building an arraylist - this deplorable issue is just outside of the narrow boundaries of this topic.

Sunday, November 23, 2008

Keyboard Bookmarklets

Find your language below, and drag the grey square to bookmarks tab in your browser.

Next time you open a page with edit field (e.g. blogger), click the bookmark in the bookmarks tab,
and here you go, the keyboard pops up.

So far it works only on Firefox, Google Chrome and Safari, though.

sq Albanian	hu Hungarian	ro Romanian
ar Arabic	il Israel(Ar,En,He,Ru)	ru Russian
hy Armenian	is Icelandic	sr Serbian
az Azeri	kn Kannada	si Sinhalese
be Belarussian	kk Kazakh	sk Slovak
bn Bengali	km Khmer	special Special (Arrows etc)
bg Bulgarian	lv Latvian	tg Tajik
hr Croatian	lt Lithuanian	ta Tamil
cs Czech	mk Macedonian	tt Tatar
et Estonian	ml Malayalam	te Telugu
fa Farsi	mr Marathi	th Thai
fi Finnish	mn Mongolian	tr Turkish
ka Georgian	no Norwegian	tk Turkmen
gu Gujarati	ps Pashto	ug Uighur
hi Hindi	pl Polish	ua Ukraine (Ua,Ru)

Questions? Write me.

Sunday, November 16, 2008

What To Do with Dispatch of Static Methods for Java

A problem with Java that everybody encounters from time to time is the following: static methods cannot override each other. If class A has static method m, and class B has static method m, with the same signature, there is no automatic way to figure out which method should be called.

Look at these two classes:

public class A {
  String x;

  public A(String x) {
    this.x = x;
  }
  public String toString() {
    return "A(" + x + ")";
  }

  public static A combine(A a1, A a2) {
    return new A(a1.x + " + " + a2.x);
  }
}

and

public class B extends A {

  public B(String x) {
    super(x);
  }
  public String toString() {
    return "B(" + x + ")";
  }

  public static B combine(B b1, B b2) {
    return new B(b1.x + " * " + b2.x);
  }
}

I would be nice if the right combine method would be called in each case. But no, the following code

import static A.*;
import static B.*;

public class Test {
  public static void main(String[] args) {
    A a1 = new A("a1");
    A a2 = new A("a2");
    B b1 = new B("b1");
    B b2 = new B("b2");
    A c = new B("c");
    System.out.println(combine(a1, a2));
    System.out.println(combine(b1, b2));
    System.out.println(combine(a1, b2));
    System.out.println(combine(b1, c));
  }
}

will print
A(a1 + a2) B(b1 * b2) A(a1 + b2) A(b1 + c)

Why does the second call prints B(b1 * b2)? Just because the method combine that takes two arguments of type B is available via static import B.*.

Interesting, we know the objects types, but dispatch is done statically. Can we fix it?

One awkward solution is to use reflection: add the following to A

public static A smartCombine(A a1, A a2) {
  Class class1 = a1.getClass();
  try {
    Method method = class1.getMethod("combine", class1, a2.getClass());
    System.out.println(method);
    return (A)method.invoke(null, a1, a2);
  } catch (NoSuchMethodException e) {
    throw new RuntimeException(e);
  } catch (InvocationTargetException e) {
    throw new RuntimeException(e);
  } catch (IllegalAccessException e) {
    throw new RuntimeException(e);
  }
}

then call it like this:

System.out.println(smartCombine(b1, c));

and we'll get

B(b1 * c)

Okay, not the best solution anyway. We won't seriously add a "smart" version to each static method.

So maybe we should not use static? Should we hate static? Let's define instance methods, in class A:

public A combineWith(A other) {
  return new A(this.x + " | " + other.x);
}

and in class B:

public A combineWith(A other) {
  return new A(this.x + " & " + other.x);
}

public B combineWith(B other) {
  return new B(this.x + " && " + other.x);
}

Now let's try this:

System.out.println(b1.combineWith(b2));
System.out.println(b1.combineWith(c));
System.out.println(c.combineWith(b1));

We well see the following:


B(b1 && b2)
A(b1 & c)
A(c & b1)

What went wrong? The first case is ok, but how about others? The second case is due to the simple fact that the dispatch is done statically. So the compiler sees the parameter is an A, and calls the method that takes an A.
How about the next one, we use two instances of B, but still? It is trickier. The compiler sees an instance of A, and calls a method combineWith of class A; this method takes an A as a parameter. At runtime, since c is an instance of B an appropriate method of B is called; but the signature still is combineWith(A), and this sad truth destroys our plans.

So, what can we do? double dispatch should help, right? Here is a good example of how to use double dispatch in in Java.

Unless you do not know already, the trick is this. We have class A and and its subclass B, and a parallel hierarchy, X and its subclass Y. These two hirerachies interoperate, and we want to make sure that if, say, an instance of B works with an instance of Y, it would recognize this even if the instances are passed around as instances of A and X.

For double dispatch to work, class A should be aware of classes X and Y; similarly, class X should be aware of A and B.

Not so in our case; we have just one hierarchy, and of course our A is not aware of its subclass B.

So our naive attempt of using double dispatch will fail.

First, add to A the following:

static A combine(A a1, A a2) {
  return a1.combineFollowedBy(a2);
}

A combineFollowedBy(A other) {
  return other.combinePrecededBy(this);
}

A combinePrecededBy(A other) {
  return new B(this.x + " | " + other.x);
}

and add to B a similar code:

A combineFollowedBy(A other) {
  return other.combinePrecededBy(this);
}

A combineFollowedBy(B other) {
  return other.combinePrecededBy(this);
}

A combinePrecededBy(B other) {
  return new A(this.x + " && " + other.x);
}

A combinePrecededBy(A other) {
  return new A(this.x + " & " + other.x);
}

As I already said, this won't bring happiness in this world:


A(a2 | a1)
A(b2 & b1)
A(b2 & a1)
A(c & b1)

The reason is obvious: nowhere in our call sequence for, say, c and b1 we have a chance to call B.combineX(B)

So, what should we do? For a case like this I have only one suggestion: use instanceof.

We don't need double dispatch. Let's simplify A:

static A combine(A a1, A a2) {
  return a1.combineFollowedBy(a2);
}

A combineFollowedBy(A other) {
  return new A(this.x + " | " + other.x);
}

And make B a little bit less kosher:

A combineFollowedBy(A other) {
  return other instanceof B ?
      combineFollowedBy((B)other) :
      super.combineFollowedBy(other); 
}

A combineFollowedBy(B other) {
  return new B(this.x + " & " + other.x);
}

Now it works. A small stylistic sacrifice for fixing the lack of functionality.

Friday, October 24, 2008

Java's Strong Typing

In a typical case of user application dealing with some kind of "apache javabean properties access", we still want to have generics help us with strong typing, to keep code safe. For this, we introduce a parameterized class Property<T>. And write something like this:

public class ClientCode {
  Object model;
  public ClientCode(Object model) {
    this.model = model;
  }

  public void businessLogic() {
    Property<String> name = new Property<String>("userName", model);
    Property<Date> dob = new Property<Date>("dateOfBirth", model);
    Property<Boolean> isGood = new Property<Boolean>("isGood", model);

    if (isGood.value() && dob.value().before(new Date(1990, 10, 24))) {
      System.out.println("One good adult user: " + name.value());
    }
  }

  public static void main(String[] args) {
    new ClientCode("ignore me").businessLogic();
  }
}

Naive but natural. And now the implementation of Property<T>:

package common;

public class Property<T> {
  String name;
  T value;

  public Property(String name, Object container) {
    this.name = name;
    this.value = (T) "hello world";
//        org.apache.common.weird.helper.util.BeanAccessUtil.
//            getProperty(container, name);
  }

  public T value() { return value; }
}

Looks pretty convincing, right? But surprise! Do you think this code will throw a ClassCastException from within Property<T>? Seems like it should; we even have a variable that, in case of it being type Date, cannot accept a string in assignment. Right?

Wrong...

As you know, generic type information is lost in compilation. As a result, all the compiled class knows about value type is that it is something. It is actually an Object, and can accept whatever is assigned. More, the method value() also returns an Object. But the client is sure, due to generic parameterization, that it returns the expected type. But it does not. And the exception is thrown right in the client code.

Monday, October 20, 2008

Experimenting with Covariance and Contravariance in Java Generics

Here I'll just publish the code that demonstrates what is accepted and what is not by the compiler when you use generics. Not many comments, since most of the code is extremely trivial.

Oh, and definitions. An expression is contravariant in variable x of class X if x can be substituted with an instance y of class Y which is a subclass of X. An expression is covariant in variable x of class X if x can be substituted with an instance y of class Y which is a superclass of class X.

E.g. (a = f(b)) is contravariant in b and covariant in a.

Let's have the following class inheritance diagram ('=>' means iheritance):

E => C, D => C, C => B, B => A

In details,


  class A {
    String id;

    A(String id) { this.id = id; }
  }

  class B extends A {
    B(String id) { super(id); }
  }

  class C extends B {
    C(String id) { super(id); }
  }

  class D extends C {
    D(String id) { super(id); }
  }

  class E extends C {
    E(String id) { super(id); }
  }

And let's have an interface and a class, something similar to Map and HashMap:


  interface M<X, Y> {
    void put(X x, Y y);
    Y get(X x);
  }

  class M1<X, Y> implements M<X, Y> {
    M1() {};
    public void put(X x, Y y){};
    public Y get(X x){ return null; }
  }

The following code demonstrates a ubiquitous and pretty legal usage of covariance and contravariance.

First, the example where we vary the second parameter ("the value"), and provide an exact knowledge of its type on declaration (and instantiation):


  void testParameter2() {
    M<String, B> mSB = new M1<String, B>();
    mSB.put("", new B(""));
    mSB.put("", new C(""));
    B aa = mSB.get("");
    M<String, C> mSC = new M1<String, C>();
    mSC.put("", new C(""));
    B b = mSC.get("");
    C c = mSC.get("");
  }

Now let's try a contravariant version of wildcard in the second parameter of M. Turns out that using wildcard duly blocks some of the "expected" functionality.


  void testParameter2WithExtendsWildcard() {
    M<String, ? extends C> mSCx = new M1<String, D>();
    mSCx = new M1<String, E>();
    // Actually, only E should be accepted, but this information is lost on assignment.
    // So the next line won't compile:
    mSCx.put("", new C(""));
    // put(java.lang.String,capture#660 of ? extends common.VarianceTest.C) in
    // common.VarianceTest.M<java.lang.String,capture#660 of ? extends common.VarianceTest.C> 
    // cannot be applied to (java.lang.String,common.VarianceTest.C) 
    //
    // And we don't know if it is D (it is not).
    // So the next line won't compile:
    mSCx.put("", new D(""));
    // put(java.lang.String,capture#511 of ? extends common.VarianceTest.C) 
    // in common.VarianceTest.M 
    // cannot be applied to (java.lang.String,common.VarianceTest.D)

    B b = mSCx.get("");
    C c = mSCx.get("");
  }

Trying the same, but 'super' instead of 'extends':



  void testParameter2WithSuperWildcard() {
    M<String, ? super C> mSCs = new M1<String, C>();
    mSCs = new M1<String, B>();
    // We don't know what's hiding behind 'super', maybe it's Object
    // So the next line won't compile:
    mSCs.put("", new B(""));
    // put(capture#701 of ? extends common.VarianceTest.C,java.lang.String) 
    // in common.VarianceTest.M 
    // cannot be applied to (common.VarianceTest.C,java.lang.String)

    mSCs.put("", new C(""));
    mSCs.put("", new D(""));
    // We don't know what's hiding behind 'super', maybe it's Object
    // So the next line won't compile:
    B b = mSCs.get("");
    // incompatible types found   
    // : capture#222 of ? super common.VarianceTest.C  required: common.VarianceTest.B    

    Object any = mSCs.get("");
  }

The same with parameter 1. No surprises here:


  void testParameter1() {
    M<B, String> mBS = new M1<B, String>();
    mBS.put(new B(""), "");
    mBS.put(new C(""), "");
    String s = mBS.get(new B(""));
    s = mBS.get(new B(""));
    M<C, String> mCS = new M1<C, String>();
    mCS.put(new C(""), "");
    mCS.put(new D(""), "");
    s = mCS.get(new C(""));
    s = mCS.get(new D(""));
  }

The following example shows that it is totally useless to specify 'extends' wildcard in contravariant position:


  void testParameter1WithExtendsWildcard() {
    String s;
    M<? extends C, String> mCxS = new M1<C, String>();
    mCxS = new M1<D, String>();
    // Who knows what is the actual type of param1...
    // The following two lines do not compile:
    mCxS.put(new C(""); 
    // put(capture#701 of ? extends common.VarianceTest.C,java.lang.String) 
    // in common.VarianceTest.M 
    // cannot be applied to (common.VarianceTest.C,java.lang.String)

    mCxS.put(new D(""), "");
    // put(capture#701 of ? extends common.VarianceTest.C,java.lang.String) 
    // in common.VarianceTest.M 
    // cannot be applied to (common.VarianceTest.D,java.lang.String)


    // Same here, parameter type unknown
    // The following two lines do not compile either:
    s = mCxS.get(new C(""));
    // get(capture#897 of ? extends common.VarianceTest.C) in common.VarianceTest.M    // of ? extends common.VarianceTest.C,java.lang.String> 
    // cannot be applied to (common.VarianceTest.C)

    s = mCxS.get(new D(""));
    // get(capture#897 of ? extends common.VarianceTest.C) in common.VarianceTest.M    // of ? extends common.VarianceTest.C,java.lang.String> 
    // cannot be applied to (common.VarianceTest.D)


  }

Now let's try the same with 'super' instead of 'extends'. This means that an instance of any superclass instance can show up (down to Object), we just have no way to know which one it is:


  void testParameter1WithSuperWildcard() {
    M<? super C, String> mCsS = new M1<C, String>();
    mCsS = new M1<B, String>();
    // Who knows what should be the actual type of param1...
    // The following line does not compile:
    mCsS.put(new B(""), "");
    // put(capture#248 of ? super common.VarianceTest.C,java.lang.String) in 
    // common.VarianceTest.M 
    // cannot be applied to (common.VarianceTest.B,java.lang.String)

    // C can be cast to any superclass of C
    mCsS.put(new C(""), "");
    mCsS.put(new D(""), "");
    String s = mCsS.get(new C(""));
    s = mCsS.get(new D(""));
    s = mCsS.get(new C(""));
    // Who knows what should be the actual type of param1...
    // The following line does not compile:
    s = mCsS.get(new B(""));
    // get(capture#330 of ? super common.VarianceTest.C) in common.VarianceTest.M    // of ? super common.VarianceTest.C,java.lang.String> 
    // cannot be applied to (common.VarianceTest.B)

  }

Generics with wildcards are not broken. The lack of understanding on how they work stems from the lack of understanding of covariant and contravariant substitution. Once you grasp it, the rest is easy.

And there's something specific about wildcards. If it is ? extends A, it does not mean that anything extending A can substitute. It means that there is something that extends A, and there' no way to figure out what.

Monday, October 06, 2008

Unlimited Cartesian Product in Java - part 2

(This is the second part, see part 1).

Here is the code that implements Cartesian product in Java. The implementation amounts to providing the following methods of class Cartesian:

 
    public static <X, XS extends Iterable<X>>
        Iterable<Iterable<X>> product(final XS... xss) {
      return product(Arrays.asList(xss));
    }

    public static <X, XS extends Iterable<X>>
        Iterable<Iterable<X>> product(final Iterable<XS> xss) {
      return new Cartesian<X, XS>().calculateProduct(xss);
    }

So we could write code like this:


  Iterable<Iterable<Integer>> theProduct = 
      Cartesian.product(Arrays.asList(
          Arrays.asList(1, 2, 3), 
          Arrays.asList(4, 5, 6)));
    assertEquals(9, Set(actual).size());

and like this:


  Iterable<Iterable<Integer>> theProduct = 
    Cartesian.product(
        Arrays.asList(1, 2, 3), 
        Arrays.asList(4, 5, 6));

Note that we could not use Iterable<Iterable<X>>, since, imagine, in this case we could not even build a product of Set<Set<X>>.

How do we calculateProduct? The idea is that we define it by recursion. If we have X₀, X₁,..., X_n, we define product(X₀, X₁,..., X_n) as X₀ x product(X₁, X₁,..., X_n); and if the sequence is actually empty, we return a singleton,

new ArrayList<Iterable>() {{
            add(Collections.EMPTY_LIST);
          }};

(I use crazybob's contraption to build the singleton).


  private Iterable<Iterable<X>>
      calculateProduct(final Iterable<XS> xss) {
    if (xss.iterator().hasNext()) {
      Pair<XS, Iterable<XS>> pair = split(xss);
      XS head = pair.x();
      Iterable<XS> tail = pair.y();
      return appendComponentToProduct(head, calculateProduct(tail));
    } else {
      return new ArrayList<Iterable<X>>() {{
        add(Collections.EMPTY_LIST);
      }};
    }
  }

Here I use two methods, split and appendComponentToProduct. The first one is obvious (okay, it's an exercise if you are a beginner); the second one is here. See what it does: we have a component, X₀, say,

[1, 2, 3]

, and a product of previous components X₁ x ... x X_n, which is an iterable of iterables, e.g. [[4, 5], [5, 6]]; we need to append elements of X₀ to all elements of the partial product. That appendage function is called consWith: it takes an x and appends it to all partial product elements. We apply this function to the whole X₀; what we receive has to be flattened to produce a list of tuples [x0, x1, ..., xn]


  private Iterable<Iterable<X>>
      appendComponentToProduct(
        XS component, 
        Iterable<Iterable<X>> partialProduct) {
      // E.g. [1, 2, 3], [[4, 6], [5, 6]] -> 
      // [[[1, 4, 6], [1, 5, 6]], [[2, 4, 6], [2, 5, 6]], [[3, 4, 6], [3, 5, 6]]] ->
      // [[1, 4, 6], [1, 5, 6], [2, 4, 6], [2, 5, 6], [3, 4, 6], [3, 5, 6]] 
      return flatten(consWith(partialProduct).map(component));

Flattening is, again, a trivial exercise; let's take a look at our consWith. It is a method that returns a function (which we apply to the component X₀:


    public <ElementOfProduct extends Iterable<X>>
        Function<X, Iterable<Iterable<X>>>
        consWith(final Iterable<ElementOfProduct> pxs) {
      return new Function<X, Iterable<Iterable<X>>>() {
        // takes an x, returns a function appending x to sequences
        public Iterable<Iterable<X>> apply(final X x) {
          // Takes a sequence [x1, x2, ...], returns [x, x1, x2, ...]
          return new Function<ElementOfProduct, Iterable<X>>() {
            public Iterable<X> apply(ElementOfProduct y) {
              return concat(Arrays.asList(x), y);
            }
          }.map(pxs);
        }
      };
    }

A little bit dizzy? It's okay. We have arrived to the summit.

This is Higher Order Java!

Wednesday, October 01, 2008

Unlimited Cartesian Product in Java - part 1

You probably know what Cartesian product is. Take two sets, A and B, their product, A x B, is a set of all pairs (a, b), where a belongs to A and b belongs to B. Something like that.

Here I am going to show how to build a Cartesian product for an unlimited collection of sets, or rather for an Iterable of Iterables. Unlimited means any number starting from 0. Due to limitations of Java, all these component sets (or rather component Iterables, will have the same element type.

More formally, given a sequence of sets (or of sequences), A₀, A₁, ... A_n, we will produce A = A₀ x A₁ x ... x A_n, whose elements are sequences (a₀, a₁, ..., a_n), where each a_i belongs to A_i.

So, for instance if we have just A₀ and A₁, their product will be the following: {(a₀, a₁}.

With this definition, we, with some confusion, have to admit that the Cartesian product of just one A₀ is a set of singletons, {(a)} for all a in A. Of course there is a canonical one-to-one correspondence between A and its "product".

More curious is the question what exactly is the product of 0 of components. In a category with limits it is a terminal object, 1. But how about JVM? There's no such thing as a terminal object.

No problem. We can use our definition, and take the set of empty sequences. There's just one empty sequence, so our canonical singleton has the following form:
{{}}. Did you notice the underlying type is not even mentioned. We have an Iterable of one single empty iterable of what type? Actually, Java deduces it from the type of the variable the product is assigned to. That's the safest way to pass the type knowledge into the executing code.

So, the solution I propose is to have something like this:


    Iterable<Iterable<Integer>> actual = Cartesian.product(Arrays.asList(Arrays.asList(1, 2, 3), Arrays.asList(4, 5, 6)));
    assertEquals(9, Set(actual).size());

Note the funny function name: Cartesian.product. Why do we need the special class named Cartesian? For two reasons. One, it is a neat way to provide type parameterization; second, calculating the product is not an elementary operation, so we need a bunch of private methods to help us. In JavaScript this can be accomplished by defining functions within our function; in Haskell we could comfortably add to our main definition the auxiliary "where" definitions. In Java, the "where" become class or instance methods.

So, we are introducing a special class named Cartesian, with just one exposed method, product:


  public static class Cartesian<X, XS extends Iterable<X>> {
  ...
    public static <X, XS extends Iterable<X>> Iterable<Iterable<X>> product(final Iterable<XS> xss) {
      return new Cartesian<X, XS>().calculateProduct(xss);
    }
    ...

You've noticed the IT parameter type. The reason for having this parameter is that we can pass for product calculations all kinds of Iterables of Iterables: Sets of Lists, Lists of Sets... any combination would work.

First, the algorithm. Here's how lj user=_navi_ defines it:


    product :: [[a]] -> [[a]]
    product [] = [[]]
    product (xs:xss) = product' xs (product xss)
      where
        product' xs pxs = concat $ map (\x -> map (x:) pxs) xs

In plain words, for us, Java pdestrians, it means this:

Say, we have an Iterable consisting of the first component, and the tail. And suppose we already have a product (call it pxs) built for the tail.
Then we do the following: scan the first component, A₀ attaching its elements a_i to the head of each element of the already built product.

We do this by having a function (let's call it f(a0, as) that concatenates an element a₀ with the tail [a₁, ..., a_n], producing a new sequence [a₀, ..., a_n].
Having function f, we build another function, g(a) that builds a list of all [a₀, ..., a_n] - by mapping the product of the tail, pxs using the function f: g returns, for each a₀, a list of concatenations built by f.

So, for each element a the function g produces a list. Now let's apply this to all the elements of the first component, A₀: that is, transform the component... to what? To a sequence of sequences of sequences.

We do not need such a deep structure. We just need the sequences of the lower level, listed in some order or in no order. So we do what is known as flattening the list.

The important thing is that the product we build should be lazy. As an example, suppose we build all permutations of N elements. This is equivalent to building a cartesian product of [0..N-1] x [0..N-2] x ... x [0..1] x [0]; there's N! elements in the product. We do not want to fill our whole memory with these elements before we start processing them.

(the code is published and commented in the second part)

Monday, September 29, 2008

class Function<X, Y>

As we all know, functions are not a part of Java language; and closures are banished to the North, to Microsoft. What do we, humble Java programmers, do? We invent a Function class. Not that it suddenly makes Java a functional programming language, but it gives us the opportunity to express things in a more concise way, using higher level terminology.

In this article I use Pair class, see details here.


public abstract class Function<X, Y> {
  public abstract Y apply(X x);
}

If that were all that we wanted, we could as well make it an interface. There is a certain trend to first introduce an interface, and then an abstract base class implementing this interface and providing the base functionality. But... how many different ways of defining a composition of two functions do we know? Exactly one, right? See, there are certain rules and properties, without which a function would not be a function. Let's make them a part of the abstract class, since there's no choice anyway.


  public static <X, Y, Z> Function<X, Z> 
      compose(final Function< f, final Function<Y, Z> g) {
    return new Function<X, Z>() {
      public Z apply(X x) {
        return g.apply(f.apply(x));
      }
    };
  }

  public static <T, T1 extends T> Function<T1, T> id() {
    return new Function<T1, T>() {
      public T apply(T1 t) { return t; }
    };
  }

We will be tempted to apply a Function to an Iterable, a Collection, a Set, a List.

The naive way would be to write something like this:


  Function<X, Y> f = ....
  ....
  List<X> xs = ...
  ...
  List<Y> ys = new ArrayList<Y>();
  for (X x : xs) {
    ys.add(x);
  }
  // good job!

I have three problems with this kind of code.

Problem 1. It runs an explicit loop. Did you ever ask yourself what is the reason programmers have been using loops in their code? Are not all the use cases already known, so that we could just mention the case, and the loop is magically applied? Want an example? String concatenation. Nobody writes a loop to concatenate two strings.

Problem 2. We may not even need all the results in the new array, ys - why bother calculating the values? Would be cool to calculate them when needed.

Problem 3. Although an average computer these days has more memory than an average programmer can fill, it can happen that the source array is exceptionally long; and if we fill each temporary array with data, we may be soon out of memory. And for what reason? For no reason, just because we don't know better.

But we do.

The transformed set can be virtual. Built on demand. Let's start with applying a Function to an Iterator. What are we going to obtain? Another Iterator, with exactly the same number of values, but the values are different.


  public Iterator<Y> map(final Iterator<X> source) {
    return new Iterator<Y>() {

      public boolean hasNext() {
        return source.hasNext();
      }

      public Y next() {
        return apply(source.next());
      }

      public void remove() {
        source.remove();
      }
    };
  }

Iterators rarely materialize themselves in the code; they use to be returned by Iterables; so let's apply the same trick to Iterables.


  public Iterable<Y> map(final Iterable<X> source) {
    return new Iterable<Y>() {
      public Iterator<Y> iterator() {
        return map(source.iterator());
      }
    };
  }

We may not feel comfortable if the resulting Iterable's iterator() is called multiple times; but in this case we can eventually decide to cache the values. Or maybe not, if there's a billion of them, and the function is inexpensive to calculate.

The only difference between a Collection and an Iterable is that in the case of Collection we know the size. So there.


  public Collection<Y> map(final Collection<X> source) {
    return new AbstractCollection<Y>() {
      public Iterator<Y> iterator() {
        return map(source.iterator());
      }

      public int size() {
        return source.size();
      }
    };
  }

For List, we throw in one more method, get(int):


  public List<Y> map(final List<X> source) {
    return new AbstractList<Y>() {
      public Y get(int index) {
        return apply(source.get(index));
      }

      public Iterator<Y> iterator() {
        return map(source.iterator());
      }

      public int size() {
        return source.size();
      }
    };
  }

I wonder if anybody can come up with a solution that does not involve copy-pasting iterator() implementation.

Now, you have probably noticed that all these transformation methods are not static "util", but are a part of Function class definition. As a result, if you have a Function f, you can write in your code List<Y> results = f.map(souceList);

Applying the same trick to a Set is not exactly fair. Set "contract" say that we should not have repeating values; and in our case there is no guarantee that we would not:


  public Set<Y> map(final Set<X> source) {
    return new AbstractSet<Y>() {
      public Iterator<Y> iterator() {
        return map(source.iterator());
      }

      public int size() {
        return source.size();
      }
    };
  }

On certain occasions, though, the uniqueness of the function values is guaranteed, like in the case of Schwartzian Transform. Schwartzian transform consists of producing pairs (x, f(x)) for a given collection of x's. This operation is known to be useful in the times of Perl language; I found it useful in Java too.


  private static <X, Y> Function<X, Pair<X, Y>> schwartzianTransform(final Function<X, Y> f) {
    return new Function>() {
      public Pair<X, Y> apply(final X x) {
        return LazyPair(x, f);
      }
    };
  }

It is kind of funny that I have to define this method as static; one of us (me or java) is not smart enough to build new functions inside instance methods. Feel free to go ahead and try.

All of this allows us, given a Function and a Set, build a Map:


  public Map<X, Y> toMap(final Set<X> keys) {
    // this innocent function makes a Pair look like a Map.Entry
    Function<Pair<X, Y>, Map.Entry<X, Y>> fromPairToMapEntry = id();
    final Function<X, Map.Entry<X, Y>> pairBuilder =
        compose(schwartzianTransform(this), fromPairToMapEntry);
    return new AbstractMap<X, Y>() {
      public Set<Entry<X, Y>> entrySet() {
        return pairBuilder.map(keys);
      }
    };
  }

Life, Programming, Everything