Minimum spanning tree problems often become NP-hard when one introduces side constraints. One popular such constraint is to limit the depth of the tree - the maximum number of "hops" when traversing a path from the root to a leaf in the tree. That's the k-hops minimum spanning tree problem, supposedly motivated by network applications, because delays along a path are roughly proportional to the number of hops.

In this problem, unlike the minimum spanning tree problem, the input edge weights do not necessarily define a metric. Indeed, in the minimum spanning tree problem, we usually start by saying: "Without loss of generality we can assume that the triangular inequality holds; otherwise, replace costly edges by shortest paths." But in the k-hop minimum spanning tree problem, such replacements may be invalid because they increase the number of hops.

When the input weights do not define a metric, known algorithms are quite poor. If there exists a tree with k hops and weight w, then one can produce a tree with O(k log n) hops and weight O(w log n), but doing better is impossible.

However, the situation is much brighter with random inputs drawn from a nice distribution. In a 2008 arxiv paper by Angel, Flaxman and Wilson, one assumes that the input is a complete graph and that the edges are drawn independently from the same distribution, for example the uniform U[0,1] distribution, or an exponential, or some other reasonable distribution (i.e. with similar density function near zero). Then there is a simple algorithm to produce a near-optimal tree.

Here is the algorithm.

First, assume that the bound k on the number of hops is less than log log n. Then the algorithm guesses x1, x2,..., xk, where xi is the number of nodes that are i hops away from the root. Then it performs a greedy Prim-like algorithm, going level by level, where, to build level i, you pick the xi nodes that can most cheaply be connected to the nodes in the part of the tree already built. (The "guess" relies on probabilistic analysis to compute the optimal sequence x(i)).

Second, assume that the bound k on the number of hops is more than log log n. Then the algorithm builds an (unconstrained) minimum spanning tree, then greedily breaks it up into small-depth clusters, then reconnects the clusters using a straightforward extension of the first algorithm, (where the goal is now not to connect the entire graph but to create a small depth subtree intersecting all the clusters).

Although the independent weight assumption is quite restrictive, the algorithm itself is natural. In fact, the unexpected bonus is that when k>log log n, not only is the output tree near-optimal, but its cost is almost the same as the cost of the unconstrained MST: in other words, one get tree of depth O(log log n) at no extra cost. So one is left to wonder: how good is this algorithm in practice?

"Practice" is not well-defined, is it? Depending on the distribution of inputs one may be able to come up with other heuristics that are equally compelling. Hard to understand some of these things without data to play with. And who has time for that :)?

ReplyDeleteAn experimental analysis on Prim and Kruskal heuristics for bounded-hop spanning tree was in my master thesis (in italian) and in a related paper.

ReplyDeletehttp://dx.doi.org/10.1007/11814764_7

We mostly focused on point at random in the Euclidean space. I do not work anymore in that field and it was some years ago, but you may be interested.