When I find something interesting and new, I post it here - that's mostly programming, of course, not everything.

Monday, September 29, 2008

class Function<X, Y>

As we all know, functions are not a part of Java language; and closures are banished to the North, to Microsoft. What do we, humble Java programmers, do? We invent a Function class. Not that it suddenly makes Java a functional programming language, but it gives us the opportunity to express things in a more concise way, using higher level terminology.

In this article I use Pair class, see details here.


public abstract class Function<X, Y> {
public abstract Y apply(X x);
}


If that were all that we wanted, we could as well make it an interface. There is a certain trend to first introduce an interface, and then an abstract base class implementing this interface and providing the base functionality. But... how many different ways of defining a composition of two functions do we know? Exactly one, right? See, there are certain rules and properties, without which a function would not be a function. Let's make them a part of the abstract class, since there's no choice anyway.


public static <X, Y, Z> Function<X, Z>
compose(final Function< f, final Function<Y, Z> g) {
return new Function<X, Z>() {
public Z apply(X x) {
return g.apply(f.apply(x));
}
};
}

public static <T, T1 extends T> Function<T1, T> id() {
return new Function<T1, T>() {
public T apply(T1 t) { return t; }
};
}


We will be tempted to apply a Function to an Iterable, a Collection, a Set, a List.

The naive way would be to write something like this:


Function<X, Y> f = ....
....
List<X> xs = ...
...
List<Y> ys = new ArrayList<Y>();
for (X x : xs) {
ys.add(x);
}
// good job!


I have three problems with this kind of code.

Problem 1. It runs an explicit loop. Did you ever ask yourself what is the reason programmers have been using loops in their code? Are not all the use cases already known, so that we could just mention the case, and the loop is magically applied? Want an example? String concatenation. Nobody writes a loop to concatenate two strings.

Problem 2. We may not even need all the results in the new array, ys - why bother calculating the values? Would be cool to calculate them when needed.

Problem 3. Although an average computer these days has more memory than an average programmer can fill, it can happen that the source array is exceptionally long; and if we fill each temporary array with data, we may be soon out of memory. And for what reason? For no reason, just because we don't know better.

But we do.

The transformed set can be virtual. Built on demand. Let's start with applying a Function to an Iterator. What are we going to obtain? Another Iterator, with exactly the same number of values, but the values are different.


public Iterator<Y> map(final Iterator<X> source) {
return new Iterator<Y>() {

public boolean hasNext() {
return source.hasNext();
}

public Y next() {
return apply(source.next());
}

public void remove() {
source.remove();
}
};
}


Iterators rarely materialize themselves in the code; they use to be returned by Iterables; so let's apply the same trick to Iterables.


public Iterable<Y> map(final Iterable<X> source) {
return new Iterable<Y>() {
public Iterator<Y> iterator() {
return map(source.iterator());
}
};
}


We may not feel comfortable if the resulting Iterable's iterator() is called multiple times; but in this case we can eventually decide to cache the values. Or maybe not, if there's a billion of them, and the function is inexpensive to calculate.

The only difference between a Collection and an Iterable is that in the case of Collection we know the size. So there.


public Collection<Y> map(final Collection<X> source) {
return new AbstractCollection<Y>() {
public Iterator<Y> iterator() {
return map(source.iterator());
}

public int size() {
return source.size();
}
};
}


For List, we throw in one more method, get(int):


public List<Y> map(final List<X> source) {
return new AbstractList<Y>() {
public Y get(int index) {
return apply(source.get(index));
}

public Iterator<Y> iterator() {
return map(source.iterator());
}

public int size() {
return source.size();
}
};
}


I wonder if anybody can come up with a solution that does not involve copy-pasting iterator() implementation.

Now, you have probably noticed that all these transformation methods are not static "util", but are a part of Function class definition. As a result, if you have a Function f, you can write in your code List<Y> results = f.map(souceList);

Applying the same trick to a Set is not exactly fair. Set "contract" say that we should not have repeating values; and in our case there is no guarantee that we would not:


public Set<Y> map(final Set<X> source) {
return new AbstractSet<Y>() {
public Iterator<Y> iterator() {
return map(source.iterator());
}

public int size() {
return source.size();
}
};
}


On certain occasions, though, the uniqueness of the function values is guaranteed, like in the case of Schwartzian Transform. Schwartzian transform consists of producing pairs (x, f(x)) for a given collection of x's. This operation is known to be useful in the times of Perl language; I found it useful in Java too.


private static <X, Y> Function<X, Pair<X, Y>> schwartzianTransform(final Function<X, Y> f) {
return new Function>() {
public Pair<X, Y> apply(final X x) {
return LazyPair(x, f);
}
};
}


It is kind of funny that I have to define this method as static; one of us (me or java) is not smart enough to build new functions inside instance methods. Feel free to go ahead and try.

All of this allows us, given a Function and a Set, build a Map:


public Map<X, Y> toMap(final Set<X> keys) {
// this innocent function makes a Pair look like a Map.Entry
Function<Pair<X, Y>, Map.Entry<X, Y>> fromPairToMapEntry = id();
final Function<X, Map.Entry<X, Y>> pairBuilder =
compose(schwartzianTransform(this), fromPairToMapEntry);
return new AbstractMap<X, Y>() {
public Set<Entry<X, Y>> entrySet() {
return pairBuilder.map(keys);
}
};
}

Monday, September 22, 2008

Java Pair: a closer look

Who did not implement a Pair class, he did not write code yet. Or she.

People tired of implementing it, beg Sun to add Pair to the library (bugs 6229146, 4947273, 4983155, 4984991). Sun decided not to fix it. So the whole world under the Sun does.

A cheap solution would be to have a record with two fields, first and second, and implement equals() to compare both fields, and hashCode() as something like (first * 31) ^ second (computer people believe in magic "randomizing" properties of xor and number 31) so that one would be able to freely change the values.

Convenient, but wrong. What if you want to use pairs as keys in a map? You put an entry in the map, then change the key content, and kaboom! entry not found. A funny but working solution could be to freeze the hashCode on creation, getting them from another source of randomness in Java world, System.currentTimeMillis(): one can easily build half a million records within one milliseconds these days.

A better solution would be to have your pair immutable: take constructor parameters, store them, and only return them, but never change them. This, with a decent implementation of hashCode() and toString() is a good candidate to enter the imaginary World of Ideal Code.

But there's a nuisance. It would be great if it implemented Map.Entry< - that is, have aliases for getFirst() and getSecond(); first and second are now also known as key and value. Makes life easier if you already have a set of pairs and want to expose it as a map... yes, uniqueness of keys is required.

As an example, imagine that we have a Function<X, Y>, and a set of arguments; to expose this as a map, we have choices: either to instantiate a regular HashMap and fill it with the data; instantiate a Map as an AbstractMap where entrySet() is a prebuilt Set<Pair<X, Y>>, or do something smarter. See, if the key set is really huge, and/or evaluating the function takes some time/resources, we probably do not want to duplicate the keys and run the function over all of them. Who knows how many entries in this map will be needed by the client code.

Things can be even worse: the key set maybe large and/or virtual, so that building a set of pairs is an impossible task. What if it is a set of Natural Number, like this:

public class N extends AbstractSet<Integer> {
public Iterator iterator() {
return new Iterator() {
int n = 0;
public boolean hasNext() {
return true;
}

public Integer next() {
return n++;
}

public void remove() {
throw new UnsupportedOperationException();
}
};
}

public int size() {
return Integer.MAX_VALUE;
}
}


I am sure you do not want to build a set of pairs on such a set.

So let us not assume anything. One pair may be lazy, the other one just instantiate from two given values. We need an abstract class to generalize this.


/**
* Abstract pair class.
* Implementations should implement two methods,
* and may be lazy or regular.
* But in any case this is an unmodifiable pair.
*/
public abstract class Pair<X, Y>
implements Map.Entry<X, Y> {
public abstract X x();
public abstract Y y();

public X getKey() {
return x();
}

public Y getValue() {
return y();
}

public Y setValue(Y value) {
throw new UnsupportedOperationException();
}

private boolean equal(Object a, Object b) {
return a == b || a != null && a.equals(b);
}

public boolean equals(Object o) {
return o instanceof Pair &&
equal(((Pair) o).x(), x()) &&
equal(((Pair) o).y(), y());
}

private static int hashCode(Object a) {
return a == null ? 42 : a.hashCode();
}

public int hashCode() {
return hashCode(x()) * 3 + hashCode(y());
}

public String toString() {
return "(" + x() + "," + y() + ")";
}
}


From this class we can build a couple of concrete classes:


public class BasePair<X, Y> extends Pair<X, Y> {
private final X x;
private final Y y;

public X x() { return x; }
public Y y() { return y; }

private BasePair(X x, Y y) {
this.x = x;
this.y = y;
}

public static <X> BasePair<X, X> Pair(X[] source) {
assert source.length == 2 :
"BasePair should be built on a two-element array; got " +
source.length;
return new BasePair(source[0], source[1]);
}
}


BasePair is built from either two values or a two-element array; do we need a list-based constructor? It's possible.


public class LazyPair<X, Y> extends Pair<X, Y> {
private final X x;
private final Function<X, Y> f;
private Y y;
boolean haveY = false;

private LazyPair(X x, Function<X, Y> f) {
this.x = x;
this.f = f;
}

public X x() {
return x;
}

public Y y() {
if (!haveY) {
y = f.apply(x);
haveY = true;
}
return y;
}

@Override
public boolean equals(Object o) {
return o instanceof LazyPair ?
equal(x(), ((LazyPair)o).x()) : o.equals(this);
}

@Override
public int hashCode() {
return hashCode(x);
}
}


LazyPair calculates the function value only once, and caches it.

This seems to be all on this trivial issues. Thanks for reading!

Wednesday, September 17, 2008

Java: have a set of pairs, need a map

In Java, as in may other discourses, maps and sets of pairs with unique keys are more or less the same thing. The problem is that Map in Java is represented as a set of Map.Entry<X, Y>; and Map.Entry is not the best way to represent pairs in your code.

Personally I, probably like the best half of Java programmers, have my own class named Pair<X, Y>; the components are called, yes, x and y. In Map.Entry they are called key and value.

So I had to make Pair<X, Y> implement Map.Entry<X, Y>, so that I would not need to recreate a Map.Entry<X, Y> every time I have a Pair<X, Y>.

Now, if I have a Set<Pair<X, Y>>, what should I do to produce a Map<X, Y>, without replicating all my data? I will need an AbstractMap<X, Y>, and provide a method, entrySet(). I already have a set; the trouble is, this set is a set of pairs, Set<Pair<X, Y>>. Should I replicate it? No way; what if there's 50 million records? Can I cast it? No... it is just impossible. should I wrap it somehow? Yes, kind of.

Luckily, I have some cool classes and methods already that would help me to do the job.

The cool class is called Function<X, Y>. The traditional approach is to declare it as an interface with one method, Y apply(X x). I thought it would be cool to implement additional methods. First, having an Iterator<X>, one would like to map this iterator using a function, right? That's what the method in Function<X, Y>, Iterator<Y> map(Iterator<X> iteratorX is for:

public Iterator<Y> map(final Iterator<X> iteratorX) {
return new Iterator<Y>() {
public boolean hasNext() {
return iteratorX.hasNext();
}

public Y next() {
return apply(iteratorX.next());
}

public void remove() {
iteratorX.remove();
}
};
}


This was the biggest chunk of code.

Now that we can map an iterator, we can as well map an iterable, a collection, a list... but we need a set. Here we go:

public Set<Y> map(final Set<X> setX) {
return new AbstractSet<Y>() {
public Iterator<Y> iterator() {
return map(setX.iterator());
}

public int size() {
return setX.size();
}
};
}


How does it all help me? I just want to map my Set<Pair<X, Y>> to a Set<Map.Entry<X, Y>> without multiplying entities. So we need a function that returns the same pair, but shows it as a map entry. This is an identity function. Let's define it in Function class:

public static <T, T0 super T> Function<T, T0> id() {
return new Function<T, T0>() {
public T0 apply(T t) { return t; }
};
}


See, while it returns the same instance, but, in the eyes of compiler, casts it to some superclass. Now that we have it, let's apply it to solve our problem:


public static <X, Y> Map<X, Y> Map(final Set<Pair<X, Y>> pairs) {
final Function<Pair<X, Y>, Map.Entry<X, Y>> idmap = Function.id();

return new AbstractMap<X, Y>() {
public Set<Entry<X, Y>> entrySet() {
return idmap.map(pairs);
}
};
}


One little note. We had to introduce the variable, idmap - thus mentally "reifying" our identity function; it would not work if we just called Function.id() on spot - Java compiler would be too confused.

Followers

Subscribe To My Podcast

whos.amung.us