When I find something interesting and new, I post it here - that's mostly programming, of course, not everything.

Monday, September 29, 2008

class Function<X, Y>

As we all know, functions are not a part of Java language; and closures are banished to the North, to Microsoft. What do we, humble Java programmers, do? We invent a Function class. Not that it suddenly makes Java a functional programming language, but it gives us the opportunity to express things in a more concise way, using higher level terminology.

In this article I use Pair class, see details here.


public abstract class Function<X, Y> {
public abstract Y apply(X x);
}


If that were all that we wanted, we could as well make it an interface. There is a certain trend to first introduce an interface, and then an abstract base class implementing this interface and providing the base functionality. But... how many different ways of defining a composition of two functions do we know? Exactly one, right? See, there are certain rules and properties, without which a function would not be a function. Let's make them a part of the abstract class, since there's no choice anyway.


public static <X, Y, Z> Function<X, Z>
compose(final Function< f, final Function<Y, Z> g) {
return new Function<X, Z>() {
public Z apply(X x) {
return g.apply(f.apply(x));
}
};
}

public static <T, T1 extends T> Function<T1, T> id() {
return new Function<T1, T>() {
public T apply(T1 t) { return t; }
};
}


We will be tempted to apply a Function to an Iterable, a Collection, a Set, a List.

The naive way would be to write something like this:


Function<X, Y> f = ....
....
List<X> xs = ...
...
List<Y> ys = new ArrayList<Y>();
for (X x : xs) {
ys.add(x);
}
// good job!


I have three problems with this kind of code.

Problem 1. It runs an explicit loop. Did you ever ask yourself what is the reason programmers have been using loops in their code? Are not all the use cases already known, so that we could just mention the case, and the loop is magically applied? Want an example? String concatenation. Nobody writes a loop to concatenate two strings.

Problem 2. We may not even need all the results in the new array, ys - why bother calculating the values? Would be cool to calculate them when needed.

Problem 3. Although an average computer these days has more memory than an average programmer can fill, it can happen that the source array is exceptionally long; and if we fill each temporary array with data, we may be soon out of memory. And for what reason? For no reason, just because we don't know better.

But we do.

The transformed set can be virtual. Built on demand. Let's start with applying a Function to an Iterator. What are we going to obtain? Another Iterator, with exactly the same number of values, but the values are different.


public Iterator<Y> map(final Iterator<X> source) {
return new Iterator<Y>() {

public boolean hasNext() {
return source.hasNext();
}

public Y next() {
return apply(source.next());
}

public void remove() {
source.remove();
}
};
}


Iterators rarely materialize themselves in the code; they use to be returned by Iterables; so let's apply the same trick to Iterables.


public Iterable<Y> map(final Iterable<X> source) {
return new Iterable<Y>() {
public Iterator<Y> iterator() {
return map(source.iterator());
}
};
}


We may not feel comfortable if the resulting Iterable's iterator() is called multiple times; but in this case we can eventually decide to cache the values. Or maybe not, if there's a billion of them, and the function is inexpensive to calculate.

The only difference between a Collection and an Iterable is that in the case of Collection we know the size. So there.


public Collection<Y> map(final Collection<X> source) {
return new AbstractCollection<Y>() {
public Iterator<Y> iterator() {
return map(source.iterator());
}

public int size() {
return source.size();
}
};
}


For List, we throw in one more method, get(int):


public List<Y> map(final List<X> source) {
return new AbstractList<Y>() {
public Y get(int index) {
return apply(source.get(index));
}

public Iterator<Y> iterator() {
return map(source.iterator());
}

public int size() {
return source.size();
}
};
}


I wonder if anybody can come up with a solution that does not involve copy-pasting iterator() implementation.

Now, you have probably noticed that all these transformation methods are not static "util", but are a part of Function class definition. As a result, if you have a Function f, you can write in your code List<Y> results = f.map(souceList);

Applying the same trick to a Set is not exactly fair. Set "contract" say that we should not have repeating values; and in our case there is no guarantee that we would not:


public Set<Y> map(final Set<X> source) {
return new AbstractSet<Y>() {
public Iterator<Y> iterator() {
return map(source.iterator());
}

public int size() {
return source.size();
}
};
}


On certain occasions, though, the uniqueness of the function values is guaranteed, like in the case of Schwartzian Transform. Schwartzian transform consists of producing pairs (x, f(x)) for a given collection of x's. This operation is known to be useful in the times of Perl language; I found it useful in Java too.


private static <X, Y> Function<X, Pair<X, Y>> schwartzianTransform(final Function<X, Y> f) {
return new Function>() {
public Pair<X, Y> apply(final X x) {
return LazyPair(x, f);
}
};
}


It is kind of funny that I have to define this method as static; one of us (me or java) is not smart enough to build new functions inside instance methods. Feel free to go ahead and try.

All of this allows us, given a Function and a Set, build a Map:


public Map<X, Y> toMap(final Set<X> keys) {
// this innocent function makes a Pair look like a Map.Entry
Function<Pair<X, Y>, Map.Entry<X, Y>> fromPairToMapEntry = id();
final Function<X, Map.Entry<X, Y>> pairBuilder =
compose(schwartzianTransform(this), fromPairToMapEntry);
return new AbstractMap<X, Y>() {
public Set<Entry<X, Y>> entrySet() {
return pairBuilder.map(keys);
}
};
}

No comments:

Followers

Subscribe To My Podcast

whos.amung.us