Stumbled upon this interview with this year's winner of the TopCoder contest. Fascinating. I admire people who can think under pressure. I never could do that. I looked at the problems and I decided to give the hardest one a try. I gave myself unlimited time and resources (I was allowed to hit the books).
Here's the problem.
Read on to find out about my vagaries and the eventual discovery of the solution.
In "Prime Obsession" John Derbyshire writes (page 52-53):
"The other factor, much more common among mathematicians in all ages, was perfectionism.... Restaurants illustrate the point. Dishes prepared in the clatter, breakage, and yelling of an overheated kitchen appear in the public area as flawless arrangements on spotless plates, delivered by dapper murmuring waiters....Published mathematical papers often have irritating assertions of the type: 'It now follows that...,' or: 'It is obvious that...,' when it doesn't follow, and it isn't obvious at all, unless you put in the six hours the author did to supply the missing steps and checking them. There is a story about the English mathematician G.H. Hardy, whom we shall meet later. In the middle of delivering a lecture, Hardy arrived at a point in his argument where he said, 'It is now obvious that....' Here he stopped, fell silent, and stood motionless with furrowed brow for a few seconds. Then he walked out of the lecture hall. Twenty minutes later he returned, smiling, and began, 'Yes, it is obvious that...,' "
I'm going to lead you down the paths I took, not too far but far enough to see a glimpse of the overheated kitchen with the clatter and breakage. The "flawless dish" (it's actually more a home-cooked meal) will come in a few days in the form of a LaTex paper. It will be a good exercise in LaTex for me, I haven't done it in a long while. I will give a Java implementation and maybe a Haskell implementation. I'm very much still a Haskell beginner so I don't know if I can do it.
Let's start.
I always try small things first and I also try to find a formalism that lets me explore properties of the entities involved with some form of calculus.
Well, here we have BST's as entities (binary search trees). Taking a small case that's still interesting we consider trees with three nodes. I will use an ascii notation for the trees because drawing pictures in this blog entry form is hard.
Notation:
(a LS RS) is a binary search tree with root a, left subtree LS and right subtree RS. Nodes are numbers and because of the BST property nodes in LS are all smaller than a and nodes in RS are all bigger than a. If a node doesn't have a child we put nil in there.
Example:
(3 (2 1 nil) 4) is a BST rooted at 3, with the left subtree rooted at 2 and right subtree the node 4.
So for the case of three nodes y < x < z because of the BST property we can only have these possible trees:
(x y z), (y nil (x nil z)), (y nil (z x nil)), (z (x y nil) nil), (z (y nil x) nil)
Now, which one of these is best for a given assignment of node access probabilities py, px and pz.
We calculate the cost functions and compare.
cost( (x y z) ) = px + 2 py + 2 pz
cost( (y nil (x nil z)) ) = py + 2 px + 3 pz
cost( (x y z) ) < cost( (y nil (x nil z)) ) means we prefer (x y z) over (y nil (x nil z)).
So
px + 2 py + 2 pz < py + 2 px + 3 pz
or
py < px + pz.
Two observations (not proven yet, but plausible to me at that moment):
- if all probabilities are equal then a balanced BST is the best solution
- we could start with the balanced BST and then do node modifications, bottom-up to reflect the non-uniformly distributed probabilities.
If py > px + pz then we transform our BST from (x y z) to (y nil (x nil z)) by pushing node y up and node x down and to the right.
We can always push nodes up or down and preserve the BST property, even for nodes that have non-trivial subtrees.
Example:
Starting with tree (a b (c LS RS)) we find out that having c as root would be better so we push c up and a down and to the left. c cannot have more than two children though so we hang c's old left subtree LS as the right subtree of a. The BST properties stay fullfilled and we get tree (c (a b LS) RS).
I went down that road. I managed to find tree transformations that improve the situation locally. Didn't get me anywhere. It wasn't clear if visiting each node once was enough. Small examples showed that it wasn't. It was hard to prove anything. Not all was lost though. I needed to express the cost of a node as an expression of the cost of its two subnodes and by looking at the small examples I found an expression which I could also prove by induction.
(*) cost(a LS RS) = sum(probablities of all nodes in tree) + cost(LS) + cost(RS)
Now assume we know (a LS RS) has minimal cost. Does LS have minimal cost too in the set of all possible BST of nodes smaller than a ? Well, if it doesn't that means there is another subtree LS' with smaller cost. We can just plug that one into the tree rooted at a and get a new tree (a LS' RS). According to (*) then (a LS' RS) has smaller cost than (a LS RS). This contradicts our assumption that (a LS RS) has minimal cost, so LS has to be minimal too. Analog RS has to be minimal.
This means the minimal solution to our problem is formed out of minimal solutions to the subproblems.
This is a big step forward and it is time for me to stop for now.
As they say in "Matrix Reloaded": to be concluded.
Posted by Uwe Hoffmann at May 25, 2003 08:03 PM