## Monday, November 7, 2011

### Hierarchically well-separated trees, bottom-up

Recently I taught the FRT algorithm to approximate any metric by a hierarchically well separated tree metric (HST) with edge lengths that are powers of 2. The construction is top-down. Initially all vertices are together in one big cluster, and later the clusters are progressively subdivided using smaller and smaller balls. A question came up: is it possible to construct the tree metric bottom-up?

Here is an easy way to simulate FRT going roughly bottom-up, described in the papers "Network Warehouses: Efficient Information Distribution to Mobile Users" (Motskin, Downes, Kusy, Gnawali, Guibas) and "Distributed Resource Management and Matching in Sensor Networks" (Gao, Guibas, Milosavljevic, Zhou).

Pick a random priority ordering over the vertices, and a random scalar r in [1,2]. For each vertex v in parallel and for each distance d which is a power of 2 times r, find the highest priority vertex u such that B(u,d) contains v. As the power of 2 increases, this defines a sequence of balls containing v, of larger and larger radius, and with centers that have higher and higher priority.

To compute the distance between v and w in the tree metric, find the maximum d such that, in their respective sequences, v and w are in the same ball of radius d but in different balls of radius d/2. That determines the place in the tree where v and w have their lowest common ancestor x, corresponding to the ball of radius d that contains both v and w. Then the x-to-v and the x-to-w paths in the tree both have edges of lengths d/2, d/4, d/8, ..., so the distance from v to w in the tree metric is 2d.

Here is a question: is it possible to construct a hierarchical clustering that has the properties of HSTs, but that is truly constructed in a bottom-up manner? More precisely, is it possible to commit to a partition of the points into a hierarchical clustering in clusters of radius < d, (that is, commit to the bottom levels of the HST), without looking at the distances that are greater than, say, 16d?

Here is a possible answer: put v and w in the same cluster of radius d iff their sequences of balls have the same balls for radius d, 2d, 4d, 8d and 16d. From that point on, in later, coarser clustering, v and w are committed to staying together. What happens if the FRT algorithm wanted to separate them? Perhaps there is a high priority vertex x such that B(x,32d) contains v but not w. Well - tough. If v is the highest priority vertex in the cluster, then w yields to v and puts B(x,32d) in its sequence, even though technically it wasn't supposed to do it.

How does that affect the HST properties? It still creates a hierarchically well-separated tree, and the distance in the tree metric is still not much less than the distance in the input metric. But is it still upper bounded by O(log n) times the distance in the input metric, in expectation?