Life, Programming, Everything: June 2011

Everybody seems to know this "Google Interview Problem": (quoting )

"There is a 100-story building and you are given two eggs. The eggs (and the building) have an interesting property that if you throw the egg from a floor number less than X, it will not break. And it will always break if the floor number is equal or greater than X. Assuming that you can reuse the eggs which didn't break, you need to find X in a minimal number of throws. Give an algorithm to find X in minimal number of throws."

A naive programmer would first apply binary search for the first egg, then linear search for the second egg; a less naive one would try to apply Fibbonacci sequence.

Now what would happen if we just start with floor 50? We will waste the first egg, most probably, then walk up with the second one, trying each floor. What would be the number of throws in this case? We Don't Know. One would say it would be 51, but there's no reason to believe it. The only case you'll need to throw 51 times is if the first egg does not break, but we will start throwing the second egg from floor 51, 52, etc, all the way to floor 100. Would anybody do it? No.
We would divide it into halves again, right?

The other reason is that, hmm, maybe we find the right floor earlier? But what's the probability?

Right. To calculate the number of drops, we need to know the probability distribution. It is not specified in the problem. More, the problem does not even specify what exactly we are optimizing. Suppose we have a million of such buildings (and 2 million eggs, 2 per each building); what would be the good strategy? Of course we would like to minimize the average search time, not the maximum possible. If we want to apply this kind of solution to real-life problems, we need to minimize the average.

But we don't know the probability distribution! One possible (real-life) distribution is that the egg breaks at the height of an inch, so the first floor is already out of luck. But then we can modify the problem, using Faberge eggs and a 100-floor doll house.

Well, but an interviewee would not go there right away. Another good choice is to take a square root of 100, and try the first egg with the step of 10 floors, and the second egg with the step of 1 floor. A popular (and wrong) answer would give 18 to 20 as the maximum number of attempts, and 10 as an average number of attempts. Actually, the average number of tries would be 11.

The popular "right" answer that you can find on the web gives this sequence of floors to try:
14, 27, 39, 50, 60, 69, 77, 84, 90, 95, 99, 100

I will not repeat the proof here, since the proof assumes we know the problem we are solving, and I state that we don't. I mean, if we are talking about the maximum number of attempts, then 14 might be a good choice, but the problem never says this is about minimizing the maximum number of attempts, right? As somebody noticed, the minimal number of throws is either 0 (for real-life eggs and floors) or 1, for the lucky ones that are not allowed in Vegas casinos.

What I suggest to do is try to minimize the average number of throws, in the assumption of uniform distribution.

So, strictly speaking, we have values from 1 to 101, and the probability of the egg to break at floor n but not at n-1 is 1/101. Why 101? We are still not sure about floor 100, but if we had values 1..100, then it would follow that we'll have a breakage at floor 100, just because the sum of all probabilities should be 1.

Now, with this distribution, we can calculate an average number of attempts for the run of n floors (in the assumption that the egg breaks at floor n): it is (n+1)/2 - 1/n. You can calculate the formula yourself. In Scala, the calculation looks like this:


def linear(n: Int) = (n + 1.) / 2

For the first suggestion, that is, trying floor 51 and then going linear, we would have 27.(2277) attempts on average (you could also try first floor 50 and see that this is a little bit worse).

How did I calculate this? See, this strategy involves:
- throwing 1 egg;
- with the probability p1=51/101 going linear from floor 1 to floor 50;
- with the probability p2=50/101 going linear from floor 52 to floor 100.

The average number of tries is 1 + linear(51) * p1 + linear(50) * p2.

We can extend this formula for a sequence of attempts, using the following Scala code:


def forSequence(s: List[Int]) = ((0., 0) /: s.reverse) ((a,b) => { val c = a._2 + b; (a._1 * a._2 / c + 1 + linear(b) * b / c, c)})

That is, taken a list of steps, we calculate the average using recursive formula, combining the previous average with the formula for linear seach.

If we try it for the "square root search", forSequence(List(10,10,10,10,10,10,10,10,10,11)) gives 11.0; for the scientifically recommended sequence 14,13,..., forSequence(List(14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 1)) gives 10.47 as an average number of attempts. Not bad; and makes sense.

But can we do better? To figure out what's the optimal sequence, we need to apply dynamic programming... unless we know the right formula; I don't. By the way, did you notice this last jump from 4 to 1 in the "scientific" sequence? Looks suspicious; it might be better to spread this jump throughout the sequence.

My solution is this: we have the right answers for the very short sequences, like 1, 2, or 3 floors; now we will need, for each subsequent number, to find the right first step, how many floors to skip; the next step is already optimized. Since this involves a lot of backtracking, I had to introduce caching of the intermediate results, splits and averages. This is Scala, and caching function is very easy to write:


def cached[K,V](f: K => V) : K => V = {
       val cache = new scala.collection.mutable.HashMap[K,V]
       k => cache.getOrElseUpdate(k, f(k))
     }

the f() in the code is evaluated lazily - on the need-to-know basis.

Honestly, one would need to use Y-combinator, but I made a shortcut, and wrote the solution like this:


val best: Int => (Double, Int) = cached((n: Int) =>
                          if (n < 4) (linear(n), 0) else
                                          (for (k <- 2 to (n/2)) yield
                                              (1 // for the first ball
                                               + linear(k) * k / n 
                                               // average number of throws if 
                                               // the first ball breaks, with the probability of this, k/n
                                               + best(n-k)._1 * (n-k) / n, 
                                               // avg number of throws if 
                                               // the first ball does not break, with the probability
                                               k // this is the step at which we throw the first ball
                                            )).min) // minimize using default pair comparator

Here best is a function; it takes building height and returns two numbers: the average number of tries and the floor to try with the first ball. It uses cache. For the case of less than 4 floors the answer is linear. Otherwise, we try steps between 2 and half the height and see which one gives the best result.

To dump all intermediate steps, I had to write this:


def bestSteps(height: Int): (Double, List[Int]) = {
  val first = best(height)
  (first._1, if (height < 4) Nil
             else first._2 :: bestSteps(height - first._2)._2)
}

What we do here is collecting the steps; it returns the price (average number of throws) for a given building height, as well as the list of steps between floors in case the first egg does not break. The result is this:


(10.43,List(14, 12, 11, 10, 9, 9, 7, 7, 6, 5, 4, 3))

See, we beat the "scientific" (but cheap) result, and the steps suggested are more in harmony with the nature of things, are not they?

Q.E.D.

Life, Programming, Everything

Sunday, June 19, 2011

2 eggs and 100 floors

Followers

Subscribe To My Podcast

whos.amung.us

Blog Archive

Links

About Me