**Step one:**model the problem locally. Consider a Steiner tree. Root it at some vertex r and label each vertex i of the tree with its distance d(i) to r on the tree and its parent p(i) in the tree; label each vertex i not in the tree with some convention, for example, d(i) arbitrary and p(i)=0. A configuration consists of a value (d(i),p(i)) for each vertex of the graph, such that d(r)=0 and such that neighboring vertices have consistent values (in particular d(p(i))=d(i)-1 for i in the tree and not equal to r). We now have a banal constraint programming problem.

**Step two:**borrow intuition from statistical physics. Take a parameter b and define a distribution on Steiner trees, such that each tree T has probability proportional to exp(-b cost(T)). When b goes to infinity, the distribution is concentrated on the optimal Steiner tree. The crucial step: we now write the mysterious-looking, mysterious-sounding "cavity equations". (Those have nothing to do with the Brown CAVE project.)

Consider two adjacent vertices j and i. Let P(d(j),p(j), i) be defined as the probability that j is in the state (d(j),p(j)) in a certain graph: in the case where p(j) is different from i, it is the graph where every edge adjacent to i is removed; in the case where p(j)=i, it is the graph where every edge adjacent to i is removed, except {i,j}. Then the tree has to pay the cost of the edge from j to p(j), and, given the state of j, the states of the other neighbors k of j are essentially independent from one another, or at least, we hallucinate that that is the case, as if vertex {j} was a cut:

P(d(j),p(j),i)= exp(-b cost({j,p(j)}))*prod[ Q(k,d(j),p(j))],

where the product is over every neighbor k of j other than i, and Q is the probability that, in a certain graph, k is in a state consistent with (d(j),p(j)):

Q(k,d(j),p(j))=sum P(d(k),p(k),i),

where the sum is over (d(k),p(k)) compatible with (d(j),p(j)).

When the graph is a tree, if we write those equations going bottom up in the tree, we see that they are equivalent to a simple dynamic program to compute the distribution, so these equations exactly capture the distribution. When the graph has only one cycle, this approach has also been shown to be correct. When the graph is sparse with high girth and the influence of values on the objective decays quickly, one intuitively expects this to work reasonably well, but theoretically it's a big mess, both in terms of convergence and in terms of guarantees on the limit solution.

**Step three:**To solve the cavity equations, use an iterative method from some perhaps random starting point, until a fixed point is reached.

P(d(j),p(j),i)[t+1] = exp(-b cost({j,p(j)}))*prod[ Q(k,d(j),p(j))[t]]

Q(k,d(j),p(j))[t+1] = sum P(d(k),p(k),i)[t],

That fixed point can then be reinterpreted as a particular Steiner tree.

**Step four:**to make the method effective, use various tricks, clever ideas and hacks to speed it up and make it more convergent: take logs, simplify the set of variables, get rid of the "guess" of r, add some small random perturbation to eliminate possible cycles.

That's it!

Belief propagation is known to work (that is, coverge to an optimum solution) for bounded treewidth graphs for several settings. It is not clear that it works in sparse large girth graphs because one can capture hard problems on bounded degree expander with large girth. It would be interesting to know the structure of the graphs arising out of the protein application.

ReplyDeleteChandra, do you have the reference for bounded treewidth convergence? (I agree about the expander issue of course.)

ReplyDeleteMartin Wainwright from UC Berkeley has several papers on this topic. I don't know the details well but heard him give a couple of talks. http://www.eecs.berkeley.edu/~wainwrig/Pubs/publist.html#Estimation

ReplyDeleteActually, the SODA 2010 paper by Gamarnik, Shah and Wei has a good introduction with relevant references and a summary of the theoretical results on belief propagation.

ReplyDeleteIn particular: "In the context of maximum weight

independent set problem, a one-sided relation between

LP relaxation and BP is established [19]; if BP converges then it is correct and LP relaxation is tight."