Life, Programming, Everything

Sunday, June 10, 2012

Cake Pattern Practices

Intro

Cake Pattern is a Scala-specific solution to Dependency Injection problem. In Java, from where Scala has inherited a lot, DI problem is solved by either passing environment as a parameter (Env monad), or using specific ad-hoc tools, like Guice. Scala is a much more powerful language, and Cake Pattern is available here. In my opinion, Cake Pattern is the best. It is similar to passing parameters, but differently. The first publication was by Jonas Bonér in 2008, Real-World Scala: Dependency Injection (DI) I personally found the article kind of hard to read; hope this one is easier. The pattern itself is easy.

Real World Example

We have an application which depends on configuration and credential files, and talks to a server via some proprietary API. We want to work on this application. By working I mean freely refactoring it from its initial prototype state to a well-written piece of code, enriched with logging, statistics, status reporting, etc., and keep it alive, that is, modifiable. One cannot seriously do it using real world connections; tests take minutes to run, connections are brittle etc. So we want to be able to run logic without the need to connect to anything. And this includes also System.out output and System.exit() calls. How do we do it? Let's start with the "real world" part.

Abstracting The File System

The full source code can be found on github, here is a small piece.

import io.{Source => ioS}
import java.io.{PrintWriter, File}

trait FS {

  implicit def file(path: String)                       = new File(path)
  implicit def asFile(file: TextFile)                   = file.file
  implicit def textFile(file: File):       TextFile     = TextFile(file)
  implicit def textFile(path: String):     TextFile     = TextFile(file(path))
  def tempFile(prefix: String)                          = TextFile(File.createTempFile(prefix, "tmp"))

  def exists(file: File): Boolean = file.exists

  protected class Entry(val file: File) {
    lazy val canonicalFile = file.getCanonicalFile.getAbsoluteFile
    def parent = Folder(canonicalFile.getParentFile)
    def delete = file.delete
    override def toString = canonicalFile.toString
  }

  case class Folder(path: File) extends Entry(path) {
    def file(path: String): TextFile = TextFile((canonicalFile /: (path split "/")) (new File(_, _)))

  class TextFile(file: File) extends Entry(file) {
    def text = ioS.fromFile(file).mkString

    def text_=(s: String) {
      val out = new PrintWriter(file, "UTF-8")
      try{ out.print(s) } finally{ out.close }
    }
  }
}

object FS extends FS

To use it in your application, instead of "manually" finding/opening/closing files, one just has to write

import FS._

somewhere in the client code.

Mocking the File System

To avoid creating multiple files just to make sure the code works in various circumstances, we better mock the file system. Here's an example:

val mockFS = new FS {
    class MockFile(path: String, content: String) extends TextFile(new File(path)) {
      override def text = content
    }
    val files = new scala.collection.mutable.HashMap[String, MockFile]
    def add(file: MockFile) { files.put(file.getPath, file)}
    def file(pair: (String, String)) { add(new MockFile(pair._1, pair._2)) }

    file("badcredo.txt"           -> "once upon a midnight dreary while I pondered weak and weary")
    file("credoNoUser.txt"        -> """
    password = "Guess me if you can!"
    realm = fb
    """)
    file("credoBadUser.txt"        -> """
    password = "Guess me if you can!"
    realm = fb
    username=
    """)
    file("credoBadPassword.txt"        -> """
    password =
    realm = fb
    username=badpwd
    """)

    override def exists(file: File) = true // egg or chicken problem
    override def textFile(path: String) = files.getOrElse(path, throw new FileNotFoundException("what is this " + path + "?"))
  }

We can throw in tons of such cases, covering all the situations, which is less probable if you have to deal with dozens of actual files. Now good, but how can we replace a file system with this mock file system? This is how.

Eating The Cake

trait MyApp {
  val FileSystem
  /*...*/
  def login(cred: String) {
    val (userid, password, realm) = parse(FileSystem.textFile(cred).text)
    /*...*/
}

object MyApp {
  val FileSystem = FS
}

Here

MyApp

is a trait, with an accompanying object. In production circumstances we use the accompanying object, which instantiates FileSystem as FS, the object representing a regular file system. In the trait, FileSystem is represented as an abstract value; so if we extend MyApp in our test suite, we can pass our test file system:

val SUT extends MyApp {
  val FileSystem = mockFS
}

Now we can manipulate the app to our pleasure.

Intercepting Everything

In addition, in our test suite we can define this:

  trait Testable {
    val console = new StringWriter
    val out = new PrintWriter(console)
    def printed = console.toString
    class Exit(code: Int) extends Exception(code)
    def print(s: String) = out.print(s)
    def println(s: String) = out.println(s)
    def exit(code: Int) = throw new Exit(code);

And then write

val SUT extends MyApp with Testable {
  val FileSystem = mockFS
}

Provided that we did define the default implementations of println, exit etc, we are now in control of the application.

Rinse, Repeat

But what if MyApp uses another piece of code that talks to an external service? We have to do some instrumentation in that other piece of code, using the same cake pattern; the default implementation would be coming from the accompanying object, but in the case of test suite we can instantiate with something completely different, and pass it along to MyApp.

Questions?

Sunday, March 18, 2012

Miles Sabin's type negation in practice

There was a discussion recently on scala-lang regarding how to ensure that your function returns a non-null value. Here's a solution from Miles Sabin, with my comments.

the source


// Encoding for "A is not a subtype of B"
trait <:!<[A, B]

// Uses ambiguity to rule out the cases we're trying to exclude
implicit def nsub[A, B] : A <:!< B = null
implicit def nsubAmbig1[A, B >: A] : A <:!< B = null
implicit def nsubAmbig2[A, B >: A] : A <:!< B = null

// Type alias for context bound
type |¬|[T] = {
  type λ[U] = U <:!< T
}


def foo[T, R : |¬|[Unit]#λ](t : T)(f : T => R) = f(t)

foo(23)(_ + 1) // OK
foo(23)(println) // Doesn't compile

So, how it works.

First we define the trait <:!< which looks like a negation of <:< - in Scala this notation means that one type can be, via implicits or whatever, converted into another.
Note that when you declare a type, and the type depends on two parameters, you can write the type in the infix form, e.g. String Map Int; in our case, instead of val x: <:!<[A, B] we can write val x: A <:!< B

E.g.


scala> def f(m: String Map Int) = Map("one" -> 1)
f: (m: Map[java.lang.String,Int])scala.collection.immutable.Map[java.lang.String,Int]

Why do we need a trait that does not specify any functionality? See below.

We define three implicits. The first one, implicit def nsub[A, B] : A <:!< B = null, is applicable in a case of any two types A and B; if we had just this one, the compiler would be never confused. Now we add two more to confuse ~~the cat~~ the compiler:


implicit def nsubAmbig1[A, B >: A] : A <:!< B = null
implicit def nsubAmbig2[A, B >: A] : A <:!< B = null

Their role is that whenever we encounter a context where <:!<[A, B] is required, and B is a supertype of A, ~~the cat~~ the compiler gets confused. It does not get confused if there is no such relationship between the two types. Namely, we get this error:


:27: error: ambiguous implicit values:
both method nsubAmbig1 in object $iw of type [A, B >: A]=> <:!<[A,B]
 and method nsubAmbig2 in object $iw of type [A, B >: A]=> <:!<[A,B]
 match expected type <:!<[Unit,Unit]

So, what do we do to use this feature? We declare a type function (seems like a new term) that is negation type for type T: whenever we apply this function to type U, it only allows to compile if U is not a subtype of T.


type |¬|[T] = {
  type λ[U] = U <:!< T
}

This is a kind of a type trap; let's see how we use it.

Declare a function foo that takes first parameter of type T, and second parameter a function from T to R:


def foo[T, R](t : T)(f : T => R) = f(t)

We can call it as val n24 = foo(23)(_ + 1) or as foo(23)(println). Now how do we make sure that it does not take a function that returns Unit? We have to delimit the second type parameter, R, so that it's never a Unit or a subclass of Unit. This is how we do it:


def foo[T, R : |¬|[Unit]#λ](t : T)(f : T => R) = f(t)

Now the second example, foo(23)(println), won't compile. Ta-da!

Questions?

Wednesday, August 24, 2011

Either OOP or Liskov (choose one)

A while ago I read some Abadi-Cardelli, and illustrated what I learned by an example how Java does not have subsumptions principle (that is, if A is a subtype of B, an instance of A can be safely substituted everywhere where an instance of A is required).

So I had posted a blog entry with the example.

There's another post on this on blogger.

A deeper look into the issue can be found in "Is the Java Type System Sound? by Sophia Drossopoulou and Susan Eisenbach.
The paper shows that no, it is not, and also builds a sublanguage of Java where subsumption holds.

From this article I've got the following demonstration sample:


    class A {}

    class A1 extends A {}

    class B {

          char f(A x) { System.out.println("f(A." + x + ")"); return 'f'; }

          int f(A1 y) { System.out.println("f(A1." + y + ")"); return 12345; }

          void g(A1 z) { System.out.println("Called g"); System.out.println(f(z)); System.out.println("out of g"); }

          void h(A u, A1 v) { System.out.println("Called h"); System.out.println(f(u)); System.out.println(f(v)); System.out.println("out of h"); }

    }



    @Test

    public void testJavaSubsumption() {

        new B().g(new A1());

        new B().h(new A1(), new A1());

   }

The output is here:


Called g

f(A1.pigi.examples.library.tests.QueriesTest$A1@6526804e)

12345

out of g

Called h

f(A.pigi.examples.library.tests.QueriesTest$A1@42b1b4c3)

f

f(A1.pigi.examples.library.tests.QueriesTest$A1@20d2906a)

12345

out of h

I wonder though how long will the Earth population hold to the popular belief that one can have both OOP, with reasonable enough "dispatch", and substitution principle. One cannot.

Questions?

P.S. Actually, the issue was discussed in details By Oleg Kiselyov a while ago: http://okmij.org/ftp/Computation/Subtyping/.

Saturday, July 30, 2011

Generic Data Model

Introduction

15 Years ago I had a problem designing a database that was supposed to hold data from various questionnaires; the questions were slightly varying every time and had no
regular structure, reflecting the weird ways of thinking of marketing people. The solution was to produce something more or less generic, storing “entities” and “attributes” separately.

(there was some discussion of xml and corba, now totally irrelevant)

Tables

The model consists of three tables: ENTITY, ATTRIBUTE, CONTEXT.

`ENTITY` Table

This is the main table in the database:

create table ENTITY (PK long, CLASS varchar, CREATED timestamp);

All the entities that we are going to have stored in our database have a representative in ENTITY table. An entity represents an object of our model. An object has an inalienable property, class. The term ‘type’ would probably suit better. An entity cannot change its class; actually, all the fields in entity record are read-only.

`ATTRIBUTE` Table

This table contains attributes of the entities. Records consist of name-value pairs and reference ENTITY table:

create table ATTRIBUTE (ENTITY long, NAME varchar, VALUE varchar);

The usage of this table is probably obvious. Note that if NAME is null, it means that the attribute contains the value of the entity itself. Two other fields are non-null.

`CONTEXT` Table

On one hand, this table can be considered as containing all the collections of the database; but on the other hand, the purpose of this table is wider. It contains all the naming contexts, and it can be used for navigation as well as for storing master-detail relationships.

create table CONTEXT (NAME varchar, OWNER long, MEMBER long, MEMBERID varchar);

Here NAME is the name of the context. If you are interested in collections, it is probably the name of collection owned by OWNER. OWNER is obviously the entity that owns the context, the “master”. MEMBER points to the member of the collection; in this case MEMBERID is irrelevant. But we can consider this as a Hash, in which case MEMBERID becomes a key, and MEMBER is the corresponding value. In the Naming Service paradigm, MEMBERID is the name of the entity MEMBER in the context NAME.

Conclusion

These days the idea of key-value pairs has become ubiquitous; it was not so in 1996 or in 2001 when I wrote this. So I'm dropping the argument in favor of this, now obvious, solution.

Friday, July 15, 2011

Chain Calls in Java - a better solution

A while ago I wrote "Java Solutions: inheriting chain calls"; it was about what to do if we want to make it possible to write calls like

return new StringBuffer().append("Now is ").append(new Date()).append(", and ticking...").toString();

and be able to subclass at the same time.

See, the problem is that if you setter returns this:

  MyBaseEntity withExcitingNewFeature(ExcitingNewFeature f) {
    features.add(f);
    return this;
  }

and then you subclass MyBaseEntity, like MyDerivedEntity extends MyBaseClass, you won't be able to write

  MyDerivedEntity mde = new MyDerivedEntity(entityId).withExcitingNewFeature(feature);

Since, well, the class is different.

So in my previous post I suggested to make MyBaseEntity type-dependent (that is, have a generic):

 public abstract class MyBaseEntity<T extends MyBaseEntity>

and declare a method

   abstract T self();

So that every subclass declaration would look like this:

  public class MyDerivedEntity extends MyBaseEntity<MyDerivedEntity>

and would define

  MyDerivedEntity self() {
    return this;
  }

Yes, it works. And probably for 2008, with is now ancient history, when slow people slowly typed their slow code, not caring about the speed of development... in short, it was long ago.

Now there's a better solution. No need to redefine self() in all possible subclasses. That was just stupid. The only place where it should be defined is the base class:

  T self() {
    return (T) this;
  }

And no need to declare MyBaseEntity abstract.

There's also a bug in that old post; the generic type should have been bounded, as Ran noted in his comment.

Of course all setters should look like this:

  <T extends MyBaseEntity> T withExcitingNewFeature(ExcitingNewFeature f) {
    features.add(f);
    return self();
  }

Sunday, June 19, 2011

2 eggs and 100 floors

Everybody seems to know this "Google Interview Problem": (quoting )

"There is a 100-story building and you are given two eggs. The eggs (and the building) have an interesting property that if you throw the egg from a floor number less than X, it will not break. And it will always break if the floor number is equal or greater than X. Assuming that you can reuse the eggs which didn't break, you need to find X in a minimal number of throws. Give an algorithm to find X in minimal number of throws."

A naive programmer would first apply binary search for the first egg, then linear search for the second egg; a less naive one would try to apply Fibbonacci sequence.

Now what would happen if we just start with floor 50? We will waste the first egg, most probably, then walk up with the second one, trying each floor. What would be the number of throws in this case? We Don't Know. One would say it would be 51, but there's no reason to believe it. The only case you'll need to throw 51 times is if the first egg does not break, but we will start throwing the second egg from floor 51, 52, etc, all the way to floor 100. Would anybody do it? No.
We would divide it into halves again, right?

The other reason is that, hmm, maybe we find the right floor earlier? But what's the probability?

Right. To calculate the number of drops, we need to know the probability distribution. It is not specified in the problem. More, the problem does not even specify what exactly we are optimizing. Suppose we have a million of such buildings (and 2 million eggs, 2 per each building); what would be the good strategy? Of course we would like to minimize the average search time, not the maximum possible. If we want to apply this kind of solution to real-life problems, we need to minimize the average.

But we don't know the probability distribution! One possible (real-life) distribution is that the egg breaks at the height of an inch, so the first floor is already out of luck. But then we can modify the problem, using Faberge eggs and a 100-floor doll house.

Well, but an interviewee would not go there right away. Another good choice is to take a square root of 100, and try the first egg with the step of 10 floors, and the second egg with the step of 1 floor. A popular (and wrong) answer would give 18 to 20 as the maximum number of attempts, and 10 as an average number of attempts. Actually, the average number of tries would be 11.

The popular "right" answer that you can find on the web gives this sequence of floors to try:
14, 27, 39, 50, 60, 69, 77, 84, 90, 95, 99, 100

I will not repeat the proof here, since the proof assumes we know the problem we are solving, and I state that we don't. I mean, if we are talking about the maximum number of attempts, then 14 might be a good choice, but the problem never says this is about minimizing the maximum number of attempts, right? As somebody noticed, the minimal number of throws is either 0 (for real-life eggs and floors) or 1, for the lucky ones that are not allowed in Vegas casinos.

What I suggest to do is try to minimize the average number of throws, in the assumption of uniform distribution.

So, strictly speaking, we have values from 1 to 101, and the probability of the egg to break at floor n but not at n-1 is 1/101. Why 101? We are still not sure about floor 100, but if we had values 1..100, then it would follow that we'll have a breakage at floor 100, just because the sum of all probabilities should be 1.

Now, with this distribution, we can calculate an average number of attempts for the run of n floors (in the assumption that the egg breaks at floor n): it is (n+1)/2 - 1/n. You can calculate the formula yourself. In Scala, the calculation looks like this:


def linear(n: Int) = (n + 1.) / 2

For the first suggestion, that is, trying floor 51 and then going linear, we would have 27.(2277) attempts on average (you could also try first floor 50 and see that this is a little bit worse).

How did I calculate this? See, this strategy involves:
- throwing 1 egg;
- with the probability p1=51/101 going linear from floor 1 to floor 50;
- with the probability p2=50/101 going linear from floor 52 to floor 100.

The average number of tries is 1 + linear(51) * p1 + linear(50) * p2.

We can extend this formula for a sequence of attempts, using the following Scala code:


def forSequence(s: List[Int]) = ((0., 0) /: s.reverse) ((a,b) => { val c = a._2 + b; (a._1 * a._2 / c + 1 + linear(b) * b / c, c)})

That is, taken a list of steps, we calculate the average using recursive formula, combining the previous average with the formula for linear seach.

If we try it for the "square root search", forSequence(List(10,10,10,10,10,10,10,10,10,11)) gives 11.0; for the scientifically recommended sequence 14,13,..., forSequence(List(14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 1)) gives 10.47 as an average number of attempts. Not bad; and makes sense.

But can we do better? To figure out what's the optimal sequence, we need to apply dynamic programming... unless we know the right formula; I don't. By the way, did you notice this last jump from 4 to 1 in the "scientific" sequence? Looks suspicious; it might be better to spread this jump throughout the sequence.

My solution is this: we have the right answers for the very short sequences, like 1, 2, or 3 floors; now we will need, for each subsequent number, to find the right first step, how many floors to skip; the next step is already optimized. Since this involves a lot of backtracking, I had to introduce caching of the intermediate results, splits and averages. This is Scala, and caching function is very easy to write:


def cached[K,V](f: K => V) : K => V = {
       val cache = new scala.collection.mutable.HashMap[K,V]
       k => cache.getOrElseUpdate(k, f(k))
     }

the f() in the code is evaluated lazily - on the need-to-know basis.

Honestly, one would need to use Y-combinator, but I made a shortcut, and wrote the solution like this:


val best: Int => (Double, Int) = cached((n: Int) =>
                          if (n < 4) (linear(n), 0) else
                                          (for (k <- 2 to (n/2)) yield
                                              (1 // for the first ball
                                               + linear(k) * k / n 
                                               // average number of throws if 
                                               // the first ball breaks, with the probability of this, k/n
                                               + best(n-k)._1 * (n-k) / n, 
                                               // avg number of throws if 
                                               // the first ball does not break, with the probability
                                               k // this is the step at which we throw the first ball
                                            )).min) // minimize using default pair comparator

Here best is a function; it takes building height and returns two numbers: the average number of tries and the floor to try with the first ball. It uses cache. For the case of less than 4 floors the answer is linear. Otherwise, we try steps between 2 and half the height and see which one gives the best result.

To dump all intermediate steps, I had to write this:


def bestSteps(height: Int): (Double, List[Int]) = {
  val first = best(height)
  (first._1, if (height < 4) Nil
             else first._2 :: bestSteps(height - first._2)._2)
}

What we do here is collecting the steps; it returns the price (average number of throws) for a given building height, as well as the list of steps between floors in case the first egg does not break. The result is this:


(10.43,List(14, 12, 11, 10, 9, 9, 7, 7, 6, 5, 4, 3))

See, we beat the "scientific" (but cheap) result, and the steps suggested are more in harmony with the nature of things, are not they?

Q.E.D.

Thursday, May 05, 2011

Monad Genome, part 1

Intro

Recently I started looking into the origin of computer science monad, and found out that at least some of them originate in pretty simple things. Like segments of simplicial category Δ (for a better explanation/definition, see MacLane, p.175). Here I'll give a couple of examples, Option and Reader monads, how they derive from a very simple subcategory of Δ.

0, 1, 1+1

I'm going to introduce a very simple category, consisting of three, well, sets: empty, singleton and a union of two singletons (1+1), together with two arrows, 0 → 1 and 1+1 → 1:

There's not much to talk about this category; you can think of it as living in the category of all sets, but actually any category with finite limits and colimits contains it.

Now I'll show how it generates two popular monads.

Option Monad

Given an object X, if we add X to each part of the diagram (a), we get this:

which is exactly Option Monad.

Exception Monad

First, we multiply the diagram (a) by E:

Then we can do the same trick as in Option Monad, add an X:

This is Exception Monad.

Reader Monad

Given an X, we can build the following diagram by applying X^- functor to the diagram (a):

Which is, simply speaking, just

Now this can be used to build the following diagram, for any A:

Do you see Reader diagram? It is the functor

, with all the natural properties that follow from the naturality of the morphisms above.

This post is work in progress; comments are very welcome.

Thursday, April 28, 2011

Church Numerals in Scala

As everyone knows (see http://en.wikipedia.org/wiki/Church_encoding), in pure lambda one can define natural numbers out of nothing, like this:

0 ≡ λ f λ x . x
1 ≡ λ f λ x . f x
2 ≡ λ f λ x . f (f x)
3 ≡ λ f λ x . f (f (f x))
etc.

At our BACAT meeting yesterday, there was a question, for typed lambda, how can one express natural numbers, and also how can one interpret them in a Cartesian-closed category?

The answer to this question turned out to be trivial, but we also had managed to express it in Scala:

class TypedLambda[T] {
  type TT = T => T
  type Church = TT => TT
  val cero: Church = f => x => x
  def succ(n: Church) = f => n(f) compose f
  val uno = succ(cero)
  val dos = succ(uno)
  val tres = succ(dos)
}

Looks neat, eh?

The alternative approach can be found in Jim McBeath's post.

Sunday, January 02, 2011

Notes on my relations with CT

Zaki suggested me to write something on this topic...

So, what happened to me... In several stages.

1. I'm 19, a student in Leningrad University (that's USSR, now called Russia), and I attend a special course on homological algebra. Professor Yakovlev starts it with a short intro into category theory, a topic nobody heard of. I was overwhelmed. The thing that generalizes groups and sets and whatever. Diagrams, wow. Commutative diagrams, diagram chasing, wow. The rest was not so interesting; abelian categories, exact diagrams, hom, tensor product, ext and tor... oh whatever. In a couple of years I decided not to go into algebra.

2. I am 24, working as a programmer, trying to pick up some "science". Somewhere (where?) I see a title of a paper, "Applying Category Theory to Computer Science". Wow. My eyes are open. Here it is; database structures, they are just pullbacks. Programs, they are diagrams, are no they? Okay, nobody hears me; and more, there's no way I can get the paper. So just have to guess.

3. I am 27; me and my friends are camping somewhere in the forests of Novgorod region. Vodka, talks... I mention category theory as a good tool for perceiving comp. sci. Andrey responds by inviting me to a category seminar that they have at Technological Institute in Leningrad. Wow! I join them.

4. Several years of studying categories and toposes; MacLane's Categories for the Working Mathematician - I get the microfilm, and print it, samizdat style, in tinest possible print, for the participants. It costs 10 roubles each (like 3 bottles of vodka, if you know what I mean).

5. Andrey's friend is married to a relative of Vonnegut. So this friend, he is an otkaznik, and his American wife stays with him for several years, raising a kid, painting, just being patient. Meanwhile, after one of her trips back to US she brings Johnstone's Topos Theory... and on another trip she brings "Lolita". I devour both books. Have made hand copies of Topos Theory. Twice.

6. Andrey suggests to try to calculate topologies (and, of course, logics), over some specific finite categories; he is specifically interested in Delta3, the initial segment of simplicial category. I write code first in Basic, then figure out that the calculation for delta3 would take 2 weeks. Rewrite it into Fortran; now it's just a couple of days. So I rewrite the core parts (like enumerating subsets, calculating cones and limits) into Assembler. We have the list of 31 Grothendieck topologies over delta3 in just four hours. Meanwhile we calculate topologies for simple categories, like linear segments, commutative squares... a strange thing is discovered, on many occasions the number of topologies is 2^n, where n is the number of objects. That's for posets.

7. I prove it that for finite posets this is the case, and a little bit more. Then I spend months trying to find a typewriter with English letters (KGB is watching); used one in Hungarian trade mission, and one in the Institute of Hydrology (where global warming was then starting). Then I spend some time figuring out how to send the paper, then a KGB colonel suggests a solution, and I use it. It's published in Cah.Topo.Geom.Diff. The hardest part was deciphering French written notes on proofs.

8. I meet Yakovlev; for him, it is all a) "abstract nonsense", and b) it can't be that the result is not known yet.

9. I generalize this stuff to Karoubian categories over Boolean toposes (Booleanness is necessary) and send it to Johnstone. Johnstone, it turned out, was working on a similar problem, from the other end. So I spend the next 15 years trying to get the proof in the opposite direction - that if... then the category is Karoubian. That's how I did not do anything else. At least I tried.

10. The seminar meanwhile had died out - the guy, Stepanov (can I call him professor? something like that... he was not), had married and moved to Novorossiysk, a pretty criminal place; there he died in strange circumstances.

11. Now here in California, the word "monad" rings the bell. So I honesty try to put it in the proper perspective, writing some kind of "tutorials" and "talks", writing again the code that calculates stuff, opening presheaf.com, and ready to talk to anybody willing to listen.

12. And now we have BACAT, a category seminar where we can discuss things. And this is great.

Saturday, October 16, 2010

my codecamp talk here

Monads, Functors, Functions, Java/Scala on Prezi

Sunday, September 19, 2010

Running with SBT

So, you are tired of your slow and half-dumb ides, ok, what can you do to make your scala dev cycle more agile? Use sbt.

First go to that website, and download the jar. Store it somewhere. Write a shell script or a bat file with the only command:


java -Xmx512M -jar wherever you store the jar/sbt-launch-0.7.4.jar

Go to your project directory and run your sbt there. It will ask you some questions regarding what version of Scala you want to order, the name of your project, the like. Then it will create an sbt project in that directory. Now the project expects the directory structure to follow maven rules of the game. You don't have to. Go to the directory named project, create a directory named build, and within that directory create a scala file Project.scala, that looks something like this:


import sbt._

class CategoriesProject(info: ProjectInfo) extends DefaultProject(info)
{
//  lazy val hi = task { println("Categories in Scala"); None } // that's for fun
  override def mainScalaSourcePath = "src" // your source directory
  override def mainResourcesPath = "resources" // fiik what is this for

  override def testScalaSourcePath = "tests" // your tests directory
  override def testResourcesPath = "test-resources" // fiik what is this for
  override def outputDirectoryName = "bin" // your output directory
}

Additionally, check out lib directory in your project; it probably contains already some stuff that sbt believes it needs - that is, scala-compiler.jar, and scala-library. Copy over there the other jars you believe you need. There should be a way to reference them where they are, but I have not figured it out yet.

Now start sbt again.

If you give it a command ~test, it will cycle through compiling the code and running the tests, automatically detecting what to compile and guessing what tests to run. As soon as you save a source file, sbt wakes up and does the job.

So... I love it... as do many other scala people.

Sunday, September 12, 2010

Dealing with Infinities in Your Code - 3

Here I had introduced countable sets, and here I had demonstrated how we can build a union of two countable sets. Now is time for Cartesian products.

Suppose we have two sets, {'A', 'B'} and {1, 2, 3}. Their product will look like this: {('A',1), ('A',2), ('A',3), ('B',1), ('B',2), ('B',3)}. It is pretty easy to build: iterate over the first set, and for each element iterate over the second set. In Scala it is pretty easy:


for (a <- setA;
     b <- setB) yield (a, b)

The problem is that if the second component (setB) is infinite, we will never be able to iterate over all elements. Let setB be, for instance, the set of all natural numbers, N. In this case the product setA x setB will be yielding ('A',0), ('A',1), ('A',2),..., and will never switch to pairs like ('B',i). We have to do something about it.

Say, in the case when only one component is infinite, we could think about changing the order of the loop, first iterating over the infinite component.

That won't work actually. We cannot always know for sure if a set is finite or not. Halting problem, you know. So we have to figure out how to do it in a general way, without losing common-sense efficiency for the finite case.

The solution has been known for over 100 years, and is called Kantor Pairing Function.
In this picture:

you see this function enumerating pairs of natural numbers in the following order: (0,0), (1,0), (0,1), (2,0), (1,1), (0,2), (3,0)... - you got it.

In our case we have two essential differences: first, the sets we have are not necessarily finite, and second, elements are not retrievable by their "sequential numbers"... meaning, it's possible, but it is horribly ineffecient: to reach element #1000, we need to rescan all the elements from #1 (or rather #0, since natural numbers and array indexes start with 0). So we also do not need a fancy formula for Kantor Pairing Function (see link above), the formula that calculates product elements by their sequential numbers. All we need is a way to iterate over elements in the right order.

Okay, let's first write the factory method for Cartesian products:

def product[X, Y](xs: Set[X], ys: Set[Y]): Set[(X, Y)] = {
    val predicate = (p: (X, Y)) => xs.contains(p._1) && ys.(p._2)
    setOf(
      kantorIterator(xs, ys),
      xs.size * ys.size,
      predicate
    )
  }

The predicate definition is extracted just for readability purposes. What's important here is kantorIterator that is supposed to work with any combination of finite and infinite countable sets.

Since we do not have direct access to the elements of the sets by their indexes, we will, instead, keep the rows in iterators. Note that we can build a queue of such iterators. Let's call them rows. What the algorithm does is this:
- push row0 to the iterator queue
- yield (row0.next, y0)
- push row1 to the iterator queue
- yield (row0.next, y0) (first iterator from iterator queue)
- yield (row1.next, y1) (second (last) iterator from iterator queue)
- push row2 to the iterator queue
- yield (row0.next, y0) (that is, (x2,y0))
- yield (row1.next, y1) (that is, (x1,y1))
- yield (row2.next, y2) (that is, (x0,y2))

and so on: every time we reach the end of the queue, we push the next row iterator into the queue, reset the vertical iterator, and start all over.

The problem arises when one of the iterators is out of elements.

We have two possible cases.

Case 1. The vertical iterator halts. In this case we stop pushing new iterators into the queue, so that the size of the queue is the size of the vertical component.
Case 2. the horizontal iterator halts. The vertical one may be infinite or not, we have to shift it. E.g. if we build {'A', 'B','C'} x N, we will be producing ('A',0), ('B',0), ('A',1), ('C',0), ('B',1), ('A',2), ('C',1), ('B',2), ('A',3).... Note that after ('C',1) row0 is out of elements, and this row should be dropped from the queue. At the same time the beginning element of the vertical iterator should be shifted, so that the next loop will be not 0,1,2, but 1,2,3. In such a way we go all along the vertical iterator.

Here's the code:

def kantorIterator[X, Y](xs: Iterable[X], ys: Iterable[Y]): Iterator[(X, Y)] =
    new Iterator[(X, Y)] {
    var iterators: Queue[Iterator[Y]] = Queue()
    var xi = xs.iterator
    var yi: Iterator[Iterator[Y]] = Iterator.empty
    var shift = 0
      
    def next = {
      if (!yi.hasNext) {
        if (xi.hasNext) {
          iterators enqueue ys.iterator
        }
        yi = iterators.iterator
        xi = xs.iterator.drop(shift)
      }
      
      val yii = yi.next
      val y = yii.next
      val res = (xi.next, y)

      if (!iterators.isEmpty && yii.isEmpty) {
        iterators.dequeue
        shift += 1
      }

      res
    }

    def hasNext = !xs.isEmpty && !ys.isEmpty &&
      (xi.hasNext || (!iterators.isEmpty && iterators(0).hasNext))
  }

hasNext covers all the cases of empty components.

I probably have to mention that you won't be always able to iterate over all the subsets of a countable set. If the set is finite, it's okay, its powerset is finite; but if it is not, here you can find Kantor's proof that the powerset of an infinite countable set is not countable.

Dealing with Infinities in Your Code - 2

In this post I had introduced the idea of having countable sets in your code. Next I want to show how we can combine countable sets producing new countable sets.

We have two operations, union and Cartesian product. Let's start with disjoint union - the union of non-intersecting sets.

If we expect our sets to be both finite, there's nothing special here. The predicate checks whether an element belongs to one set or another; the iterator iterates over the first set, then the second.

What if one or both of our sets is/are infinite? The predicate seems to be still okay: we just check whether the element belongs to one of the sets. But the iterator... if the first set is infinite, our naive iterator will never reach the second set. I mean, not in this universe; we will need transfinite numbers readily available, which is well beyond the power of modern computers or most of modern humans. What can we do then? Well, we can alternate over two iterables.

Let's look at the code:

def union[X, X1 <: X, X2 <: X](set1: Set[X1], set2: Set[X2]): Set[X] = {
    lazy val parIterable = new ParallelIterable(set1, set2)
    lazy val size = if (set1.size == Integer.MAX_VALUE ||
                        set2.size == Integer.MAX_VALUE) {
                      Integer.MAX_VALUE
                    } else {
                      set1.size + set2.size
                    }
    setOf(parIterable,
          size,
          (x: X) => (x.isInstanceOf[X1] && (set1 contains x.asInstanceOf[X1]))||
                    (x.isInstanceOf[X2] && (set2 contains x.asInstanceOf[X2]))
    )
  }

What we have here? A new iterable, a new size evaluator, a new predicate. We can ignore the size evaluator; nothing is calculated here until someone requests it. The predicate is obvious; the only new thing is ParallelIterable. We need it to iterate over the union of two countable sets (rather, over two iterables) in a smart way:


class ParallelIterator[X, X1 <: X, X2 <: X](
          iterator1: Iterator[X1],
          iterator2: Iterator[X2]) extends Iterator[X] {
    var i2 : (Iterator[X], Iterator[X]) = (iterator1, iterator2)
    def hasNext = iterator1.hasNext || iterator2.hasNext
    def next = {
      i2 = (i2._2, i2._1)
      if (i2._1.hasNext) i2._1.next else i2._2.next 
    }
  }

Note that we flip i2, so it changes from (iterator1, iterator2) to (iterator2, iterator1) and back every cycle.

As a result, in the output we just interleave members from the first component with the members of the second component. What is important here is that we do not make any assumptions regarding which iterator is finite and which is infinite; if one component does not have more elements, fine, we continue scanning the other one.

This was probably pretty simple; now we have to deal with Cartesian products, that is, the set of all pairs (x: A, y: B) where x is from the first set and y is from the second set; with no assumptions regarding the finiteness of the said sets.

That would be the next part.

Thursday, September 09, 2010

Dealing with Infinities in Your Code - 1

These days it is not unusual to have an infinite sequence in a computer system. The idea is that while the sequence grows, the system upgrades, and there's no limit unless we make certain assumptions like accepting non-standard arithmetics. Then even sequences that we expect to be finite may turn out to be infinite - like Goodstein's sequence.

Still, it seems like in a foreseeable future we will only encounter countable sets. They are not necessarily infinite; any finite set is traditionally considered countable, that is, equal in size to a subset N, the set of natural numbers. The set of all real numbers, R, is a typical example of an uncountable set (it is a continuum). (Whether there is a set of intermediate size, between N and R, depends on what axiom we assume. If we assume it exists, it exists; if we assume it does not exist, it does not. Quite an eye-opener for people that use to treat Mathematics as a provider of absolute truths, right?)

What's good in a countable set is that (assuming set-theoretical axioms) one can list its element in such a way that any arbitrary element may be eventually reached. (That's due to another axiom, the Axiom of Choice - it says that we can introduce linear order in any set. If we don't assume this axiom - then we cannot.)

When I worked in Google, one of my favorite interview questions was "given an alphabet, write a code that would produce all words that сan be built based on this alphabet". Then I had to explain that when words are built out of alphabet, it does not mean each letter is used at most once. It does not. Then I had to explain that by "producing all words" I mean writing a code that could reach any word that can be built out of the alphabet. For instance, given {A,B}, the sequence "A", "AA", "AAA", etc would be infinite, but it won't constitute a solution: it will never reach letter 'B'.

In my work I limit myself with sets as they are presented in Java or Scala.

One particular problem is that, usually, an implementation of Set is supposed to have a method that returns the set's size. Not sure I know why. Probably it has something to do with array reallocations... some technical stuff. Because see, if you know that your set is infinite, you are safe, you can always return, by the rules of Java, Integer.MAX_VALUE; but what if you do not know? Say you have a function that takes another function, calls it subsequently until it halts (e.g. fails), and records the times it takes each call. What would be the size? We have to solve Halting Problem to know the answer.

On the other hand, isEmpty is a pretty decent method; it could easily belong to Iterable: to check if an Iterable is empty, one has to just get its Iterator and call its hasNext.

Also, it would be good to supply a set with a predicate, similar to what they do in Zermelo-Fraenkel Set Theory: according to the Extensionality Axiom, two sets are equal iff they consist of the same elements. Not having to count to infinity (or futher) to check if an element belongs would be a good advantage. E.g. for the set of all odds the predicate would not have to scan through all odd numbers from 1 to the given one; just check its last bit.

So that's why I came with a set that is defined by its

iterable that enumerates the elements;

predicate that "efficiently" checks if a value belongs to the set

size evaluator that, when called, tries to evaluate the size of the set

I use Scala for coding these days; hope it is reasonably pretty:

private def setForIterator[X](sourceIterator: => Iterator[X], sizeEvaluator: => Int, predicate: X => Boolean): Set[X] = {
      new Set[X] {
        override def contains(x: X) = predicate(x)
        override def size = sizeEvaluator
        override def iterator = sourceIterator filter predicate
        override def isEmpty = !iterator.hasNext
        override def -(x:X) = requireImmutability
        override def +(x:X) = requireImmutability
        def map[Y] (f: Functions.Injection[X,Y]) : Set[Y] = f.applyTo(this)
        def map[Y] (f: Functions.Bijection[X,Y]) : Set[Y] = f.applyTo(this)
        override def filter(p: X => Boolean) = filteredSet(sourceIterator, (x: X) => predicate(x) && p(x))
      }
    }

  def setOf[X](source: Iterable[X], sizeEvaluator: => Int, predicate: X => Boolean) =
    setForIterator(source.iterator, sizeEvaluator, predicate)

Had to override a bunch of methods declared in trait Set, to avoid any immediate calls of size evaluator or assuming that the size is known.

Here's how, using this class, we can define the set of natural numbers:

val N = setOf(new Iterable[Int] {
    def iterator = new Iterator[Int] {
      private var i: Int = -1
      def hasNext = true
      def next = {i += 1; i}
    }
  },
  Integer.MAX_VALUE,
  (x: Any) => x.isInstanceOf[Int] && x.asInstanceOf[Int] >= 0)

That's it for now. In the next part I'll talk about how to combine two countable sets. It is an old fact that a union and a product of two countable sets are countable; we will need a code that behaves according to this knowledge.

Tuesday, July 20, 2010

iterable, tree, monad, foundation axiom

Everybody knows now that linear data are bad for performance. But in Java everything is based on Iterable, with the idea, deeply rooted in math, that we can only visit the content of X if we iterate over it (enumerable sets etc), that is, have an epimorphism from N to that X.

Well, that's stupid. All we need is foundation axiom to hold.

In plain words, a tree is good as well, but a tree can be scanned in parallel.

But programmers of the world were trained to write a loop every time they see a plurality of something.

But the loop does not actually mean it should be interpreted sequentially.

If you write

for (x <- container) {
}

it only means that the job is done inside a monad.

That's why you should think monad, eh. That's why enumerables are XX century. Foundation axiom is way more powerful.

I actually can even prove it, but only for a Boolean base topos.

(oneof the sources)

Saturday, May 08, 2010

A Better Option for Java

Just posted an article, A Better Option for Java

Sunday, April 18, 2010

How I Installed a TV Antenna

I live in South San Jose; recently we have discovered that now that we have fast Internet, WII and Netflix, we already have access to hundreds of movies, so what do we need a tv for? Local news. Okay, but they are being broadcast for free, in hdtv, just install an antenna on the roof and go.

Started with looking up our area on antennaweb: entered my address and it gave me directions and this map.

The site also say that frequency ranges have colors, and that I have to choose an antenna that has the right colors.

Since there's over 60 miles from my home to San Francisco, the only reasonable choice was ChannelMaster 4228D - they sell it at Fry's.

Turned out that to install it I need more than just this box. I need a pole and the stuff to attach the pole to the roof. Below I explain what I did.

I bought a 5-foot piece of "black pipe" at Lowe's (here on the picture the end of the pipe), a cap for the pipe and the piece which I want to attach to the roof.

You need a cap on top so that rain won't penetrate the pipe.
Before attaching the bottom piece to the roof, I had patched the area below with roof coating, to avoid leaks:

Now we need these things from OSH (electric department and bolts and nuts department) to attach to the pole the guy wire that will hold it.

This is how the top of the pole looks like with all the stuff attached:

I have attached two eye screws to the sides of my roof, and two went straight into the roof:

After I drilled the hole for the screw, I patched it so the rain won't get in:

Now I have to attach the guy wire to the pole and to the eye screws; for this I use clams:

The guy wire does not go all the way through from the top of the pole to the screws. I cut the wire and attached turnbuckles that allow me to tighten the wires later:

This is how it looks with antenna attached to the pole using brackets, and the guy wires tightened with turnbuckles.

Now it's time to get the cable into the house. I bought a 25-foot white coaxial cable, connected it to the antenna's amplifier, got it along the roof and the wall.

I had measured the position of an existing tv socket inside the room, and used a long drill to drill through the wall to get to the socket as close as possible.

I used this bushing for the cable to get through. Bought it at Lowe's.

Had to cut the bushing, or else how would I get the cable through?

When I got the cable through the wall, I applied a good amount of clear caulk to make sure no water gets through.

I was lucky, the cable got exactly where I wanted it.

So all I had to do is connect my tv, scan the 56 channel it found and enjoy the show.

And you know what? It sucks. I don't have a dvr on those channels, so there's no way I can pause it, or get any information regarding what is it about... no recording. And the channels... what nonsense people watch, omg.

So, was it all worth it? Probably. I had fun with my antenna.

Here's a photo I took from a digital channel:

Sunday, June 10, 2012

Intro

Real World Example

Abstracting The File System

Mocking the File System

Eating The Cake

Intercepting Everything

Rinse, Repeat

Questions?

Sunday, March 18, 2012

Wednesday, August 24, 2011

Saturday, July 30, 2011

Generic Data Model

Introduction

Tables

ENTITY Table

ATTRIBUTE Table

CONTEXT Table

Conclusion

Friday, July 15, 2011

Sunday, June 19, 2011

Thursday, May 05, 2011

Intro

0, 1, 1+1

Option Monad

Exception Monad

Reader Monad

Thursday, April 28, 2011

Sunday, January 02, 2011

Saturday, October 16, 2010

Sunday, September 19, 2010

Sunday, September 12, 2010

Thursday, September 09, 2010

Tuesday, July 20, 2010

Saturday, May 08, 2010

Sunday, April 18, 2010

Followers

Subscribe To My Podcast

whos.amung.us

Blog Archive

Links

About Me

`ENTITY` Table

`ATTRIBUTE` Table

`CONTEXT` Table