When I find something interesting and new, I post it here - that's mostly programming, of course, not everything.

Sunday, September 19, 2010

Running with SBT

So, you are tired of your slow and half-dumb ides, ok, what can you do to make your scala dev cycle more agile? Use sbt.

First go to that website, and download the jar. Store it somewhere. Write a shell script or a bat file with the only command:

java -Xmx512M -jar wherever you store the jar/sbt-launch-0.7.4.jar


Go to your project directory and run your sbt there. It will ask you some questions regarding what version of Scala you want to order, the name of your project, the like. Then it will create an sbt project in that directory. Now the project expects the directory structure to follow maven rules of the game. You don't have to. Go to the directory named project, create a directory named build, and within that directory create a scala file Project.scala, that looks something like this:

import sbt._

class CategoriesProject(info: ProjectInfo) extends DefaultProject(info)
{
// lazy val hi = task { println("Categories in Scala"); None } // that's for fun
override def mainScalaSourcePath = "src" // your source directory
override def mainResourcesPath = "resources" // fiik what is this for

override def testScalaSourcePath = "tests" // your tests directory
override def testResourcesPath = "test-resources" // fiik what is this for
override def outputDirectoryName = "bin" // your output directory
}


Additionally, check out lib directory in your project; it probably contains already some stuff that sbt believes it needs - that is, scala-compiler.jar, and scala-library. Copy over there the other jars you believe you need. There should be a way to reference them where they are, but I have not figured it out yet.

Now start sbt again.

If you give it a command ~test, it will cycle through compiling the code and running the tests, automatically detecting what to compile and guessing what tests to run. As soon as you save a source file, sbt wakes up and does the job.

So... I love it... as do many other scala people.

Sunday, September 12, 2010

Dealing with Infinities in Your Code - 3

Here I had introduced countable sets, and here I had demonstrated how we can build a union of two countable sets. Now is time for Cartesian products.

Suppose we have two sets, {'A', 'B'} and {1, 2, 3}. Their product will look like this: {('A',1), ('A',2), ('A',3), ('B',1), ('B',2), ('B',3)}. It is pretty easy to build: iterate over the first set, and for each element iterate over the second set. In Scala it is pretty easy:

for (a <- setA;
b <- setB) yield (a, b)


The problem is that if the second component (setB) is infinite, we will never be able to iterate over all elements. Let setB be, for instance, the set of all natural numbers, N. In this case the product setA x setB will be yielding ('A',0), ('A',1), ('A',2),..., and will never switch to pairs like ('B',i). We have to do something about it.

Say, in the case when only one component is infinite, we could think about changing the order of the loop, first iterating over the infinite component.

That won't work actually. We cannot always know for sure if a set is finite or not. Halting problem, you know. So we have to figure out how to do it in a general way, without losing common-sense efficiency for the finite case.

The solution has been known for over 100 years, and is called Kantor Pairing Function.
In this picture: you see this function enumerating pairs of natural numbers in the following order: (0,0), (1,0), (0,1), (2,0), (1,1), (0,2), (3,0)... - you got it.

In our case we have two essential differences: first, the sets we have are not necessarily finite, and second, elements are not retrievable by their "sequential numbers"... meaning, it's possible, but it is horribly ineffecient: to reach element #1000, we need to rescan all the elements from #1 (or rather #0, since natural numbers and array indexes start with 0). So we also do not need a fancy formula for Kantor Pairing Function (see link above), the formula that calculates product elements by their sequential numbers. All we need is a way to iterate over elements in the right order.

Okay, let's first write the factory method for Cartesian products:

def product[X, Y](xs: Set[X], ys: Set[Y]): Set[(X, Y)] = {
val predicate = (p: (X, Y)) => xs.contains(p._1) && ys.(p._2)
setOf(
kantorIterator(xs, ys),
xs.size * ys.size,
predicate
)
}


The predicate definition is extracted just for readability purposes. What's important here is kantorIterator that is supposed to work with any combination of finite and infinite countable sets.

Since we do not have direct access to the elements of the sets by their indexes, we will, instead, keep the rows in iterators. Note that we can build a queue of such iterators. Let's call them rows. What the algorithm does is this:
- push row0 to the iterator queue
- yield (row0.next, y0)
- push row1 to the iterator queue
- yield (row0.next, y0) (first iterator from iterator queue)
- yield (row1.next, y1) (second (last) iterator from iterator queue)
- push row2 to the iterator queue
- yield (row0.next, y0) (that is, (x2,y0))
- yield (row1.next, y1) (that is, (x1,y1))
- yield (row2.next, y2) (that is, (x0,y2))

and so on: every time we reach the end of the queue, we push the next row iterator into the queue, reset the vertical iterator, and start all over.

The problem arises when one of the iterators is out of elements.

We have two possible cases.

Case 1. The vertical iterator halts. In this case we stop pushing new iterators into the queue, so that the size of the queue is the size of the vertical component.
Case 2. the horizontal iterator halts. The vertical one may be infinite or not, we have to shift it. E.g. if we build {'A', 'B','C'} x N, we will be producing ('A',0), ('B',0), ('A',1), ('C',0), ('B',1), ('A',2), ('C',1), ('B',2), ('A',3).... Note that after ('C',1) row0 is out of elements, and this row should be dropped from the queue. At the same time the beginning element of the vertical iterator should be shifted, so that the next loop will be not 0,1,2, but 1,2,3. In such a way we go all along the vertical iterator.

Here's the code:
def kantorIterator[X, Y](xs: Iterable[X], ys: Iterable[Y]): Iterator[(X, Y)] =
new Iterator[(X, Y)] {
var iterators: Queue[Iterator[Y]] = Queue()
var xi = xs.iterator
var yi: Iterator[Iterator[Y]] = Iterator.empty
var shift = 0

def next = {
if (!yi.hasNext) {
if (xi.hasNext) {
iterators enqueue ys.iterator
}
yi = iterators.iterator
xi = xs.iterator.drop(shift)
}

val yii = yi.next
val y = yii.next
val res = (xi.next, y)

if (!iterators.isEmpty && yii.isEmpty) {
iterators.dequeue
shift += 1
}

res
}

def hasNext = !xs.isEmpty && !ys.isEmpty &&
(xi.hasNext || (!iterators.isEmpty && iterators(0).hasNext))
}


hasNext covers all the cases of empty components.

I probably have to mention that you won't be always able to iterate over all the subsets of a countable set. If the set is finite, it's okay, its powerset is finite; but if it is not, here you can find Kantor's proof that the powerset of an infinite countable set is not countable.

Dealing with Infinities in Your Code - 2

In this post I had introduced the idea of having countable sets in your code. Next I want to show how we can combine countable sets producing new countable sets.

We have two operations, union and Cartesian product. Let's start with disjoint union - the union of non-intersecting sets.

If we expect our sets to be both finite, there's nothing special here. The predicate checks whether an element belongs to one set or another; the iterator iterates over the first set, then the second.

What if one or both of our sets is/are infinite? The predicate seems to be still okay: we just check whether the element belongs to one of the sets. But the iterator... if the first set is infinite, our naive iterator will never reach the second set. I mean, not in this universe; we will need transfinite numbers readily available, which is well beyond the power of modern computers or most of modern humans. What can we do then? Well, we can alternate over two iterables.

Let's look at the code:
def union[X, X1 <: X, X2 <: X](set1: Set[X1], set2: Set[X2]): Set[X] = {
lazy val parIterable = new ParallelIterable(set1, set2)
lazy val size = if (set1.size == Integer.MAX_VALUE ||
set2.size == Integer.MAX_VALUE) {
Integer.MAX_VALUE
} else {
set1.size + set2.size
}
setOf(parIterable,
size,
(x: X) => (x.isInstanceOf[X1] && (set1 contains x.asInstanceOf[X1]))||
(x.isInstanceOf[X2] && (set2 contains x.asInstanceOf[X2]))
)
}


What we have here? A new iterable, a new size evaluator, a new predicate. We can ignore the size evaluator; nothing is calculated here until someone requests it. The predicate is obvious; the only new thing is ParallelIterable. We need it to iterate over the union of two countable sets (rather, over two iterables) in a smart way:

class ParallelIterator[X, X1 <: X, X2 <: X](
iterator1: Iterator[X1],
iterator2: Iterator[X2]) extends Iterator[X] {
var i2 : (Iterator[X], Iterator[X]) = (iterator1, iterator2)
def hasNext = iterator1.hasNext || iterator2.hasNext
def next = {
i2 = (i2._2, i2._1)
if (i2._1.hasNext) i2._1.next else i2._2.next
}
}


Note that we flip i2, so it changes from (iterator1, iterator2) to (iterator2, iterator1) and back every cycle.

As a result, in the output we just interleave members from the first component with the members of the second component. What is important here is that we do not make any assumptions regarding which iterator is finite and which is infinite; if one component does not have more elements, fine, we continue scanning the other one.

This was probably pretty simple; now we have to deal with Cartesian products, that is, the set of all pairs (x: A, y: B) where x is from the first set and y is from the second set; with no assumptions regarding the finiteness of the said sets.

That would be the next part.

Thursday, September 09, 2010

Dealing with Infinities in Your Code - 1

These days it is not unusual to have an infinite sequence in a computer system. The idea is that while the sequence grows, the system upgrades, and there's no limit unless we make certain assumptions like accepting non-standard arithmetics. Then even sequences that we expect to be finite may turn out to be infinite - like Goodstein's sequence.

Still, it seems like in a foreseeable future we will only encounter countable sets. They are not necessarily infinite; any finite set is traditionally considered countable, that is, equal in size to a subset N, the set of natural numbers. The set of all real numbers, R, is a typical example of an uncountable set (it is a continuum). (Whether there is a set of intermediate size, between N and R, depends on what axiom we assume. If we assume it exists, it exists; if we assume it does not exist, it does not. Quite an eye-opener for people that use to treat Mathematics as a provider of absolute truths, right?)

What's good in a countable set is that (assuming set-theoretical axioms) one can list its element in such a way that any arbitrary element may be eventually reached. (That's due to another axiom, the Axiom of Choice - it says that we can introduce linear order in any set. If we don't assume this axiom - then we cannot.)

When I worked in Google, one of my favorite interview questions was "given an alphabet, write a code that would produce all words that сan be built based on this alphabet". Then I had to explain that when words are built out of alphabet, it does not mean each letter is used at most once. It does not. Then I had to explain that by "producing all words" I mean writing a code that could reach any word that can be built out of the alphabet. For instance, given {A,B}, the sequence "A", "AA", "AAA", etc would be infinite, but it won't constitute a solution: it will never reach letter 'B'.

In my work I limit myself with sets as they are presented in Java or Scala.

One particular problem is that, usually, an implementation of Set is supposed to have a method that returns the set's size. Not sure I know why. Probably it has something to do with array reallocations... some technical stuff. Because see, if you know that your set is infinite, you are safe, you can always return, by the rules of Java, Integer.MAX_VALUE; but what if you do not know? Say you have a function that takes another function, calls it subsequently until it halts (e.g. fails), and records the times it takes each call. What would be the size? We have to solve Halting Problem to know the answer.

On the other hand, isEmpty is a pretty decent method; it could easily belong to Iterable: to check if an Iterable is empty, one has to just get its Iterator and call its hasNext.

Also, it would be good to supply a set with a predicate, similar to what they do in Zermelo-Fraenkel Set Theory: according to the Extensionality Axiom, two sets are equal iff they consist of the same elements. Not having to count to infinity (or futher) to check if an element belongs would be a good advantage. E.g. for the set of all odds the predicate would not have to scan through all odd numbers from 1 to the given one; just check its last bit.

So that's why I came with a set that is defined by its

  • iterable that enumerates the elements;

  • predicate that "efficiently" checks if a value belongs to the set

  • size evaluator that, when called, tries to evaluate the size of the set
  • .

I use Scala for coding these days; hope it is reasonably pretty:

private def setForIterator[X](sourceIterator: => Iterator[X], sizeEvaluator: => Int, predicate: X => Boolean): Set[X] = {
new Set[X] {
override def contains(x: X) = predicate(x)
override def size = sizeEvaluator
override def iterator = sourceIterator filter predicate
override def isEmpty = !iterator.hasNext
override def -(x:X) = requireImmutability
override def +(x:X) = requireImmutability
def map[Y] (f: Functions.Injection[X,Y]) : Set[Y] = f.applyTo(this)
def map[Y] (f: Functions.Bijection[X,Y]) : Set[Y] = f.applyTo(this)
override def filter(p: X => Boolean) = filteredSet(sourceIterator, (x: X) => predicate(x) && p(x))
}
}

def setOf[X](source: Iterable[X], sizeEvaluator: => Int, predicate: X => Boolean) =
setForIterator(source.iterator, sizeEvaluator, predicate)


Had to override a bunch of methods declared in trait Set, to avoid any immediate calls of size evaluator or assuming that the size is known.

Here's how, using this class, we can define the set of natural numbers:
val N = setOf(new Iterable[Int] {
def iterator = new Iterator[Int] {
private var i: Int = -1
def hasNext = true
def next = {i += 1; i}
}
},
Integer.MAX_VALUE,
(x: Any) => x.isInstanceOf[Int] && x.asInstanceOf[Int] >= 0)


That's it for now. In the next part I'll talk about how to combine two countable sets. It is an old fact that a union and a product of two countable sets are countable; we will need a code that behaves according to this knowledge.

Followers

Subscribe To My Podcast

whos.amung.us