## Wednesday, July 20, 2011

### Lift-and-project and odd cycle constraints

Lift-and-project has been studied during the last 10 years in the theoretical computer science community as a systematic way to strengthen linear programs. As an introduction, let's consider the independent set problem.

Start from the usual LP relaxation for independent set: one variable x(i) for each vertex i, such that
0 ≤x(i) ≤1 for each vertex i
x(i)+x(j)≤ 1 for each edge {i,j}.

Algebraically, one round of lifting leads to the following LP in the Sherali-Adams variant: each initial constraint leads to 2n new constraints. For each k, one new constraint is obtained by multiplying by x(k) and the other by multiplying by 1-x(k), and then substitute y(i,k) for x(i)x(k). We have variables x(i) for each vertex and y(i,j)=y(j,i) for each pair of vertices, with y(i,i)=x(i). The constraints obtained by this automatic process are thus:
(1) 0≤ y(i,k) ≤x(k) for each pair of vertices i and k
(2) 0≤x(i)-y(i,k)≤ 1-x(k) for each pair of vertices i and k
(3) y(i,k)+y(j,k) ≤x(k) for each edge {i,j} and each vertex k
(4) x(i)-y(i,k)+x(j)-y(j,k) ≤ 1-x(k) for each edge {i,j} and each vertex k.
That's the automatic algebraic definition.

To understand this intuitively, note that each variable x(i) of the basic LP can be interpreted as a distribution over vertex i: x(i)=.3 means that with probability .3 vertex i is in the independent set and with probability .7 it's not in the independent set. Similarly, each {x(i),x(j),y(i,j)} of the lifted LP can be interpreted as a distribution over the pair of vertices i,j: x(i)=.35,x(j)=.6,y(i,j)=.2 means that with probability .2, both i and j are in the set, with probability .35-.2=.15, only i is in the set, with probability .6-.2=.4, only j is in the set, and with the remaining probability .25, neither i not j are in the set.

Let us focus on the lifted LP for independent set. First, notice that y(i,j)=0 for every edge {i,j}: to prove this algebraically, apply (3) to k=i, substitute y(i,i)=x(i) and simplify to get y(i,j)≤ 0. Together with (1) that implies y(i,j)=0. To understand this intuitively: the constraint x(i)+x(j)≤ 1 means that i and j cannot simultaneously be in the set, so in a joint distribution over i and j, the probability that both i and j are in the set should be zero.

Now, here's a more interesting fact about the lifted LP for independent set. I want to prove that constraints (1)-(4) imply every odd cycle constraint: for every cycle C of odd length, the sum of the x(i)'s for i in C is at most (length(C)-1)/2. At first sight, that is a bit surprising, because the lifted LP still merely consists of local constraints (with a bunch more variables), whereas the odd cycle constraints are long-range global constraints. I will prove this for a cycle of length 9, consisting of vertices 1,2,...,9.

Note that since 1 and 2 are neighbors, y(1,2)=0
Apply (4) for k=1 and {i,j}={2,3}: x(2)+x(3)-y(1,2)-y(1,3)≤ 1-x(1)
Apply (3) for k=1 and {i,j}={3,4}: y(1,3)+y(1,4)≤ x(1)
Apply (4) for k=1 and {i,j}={4,5}: x(4)+x(5)-y(1,4)-y(1,5)≤1-x(1)
Apply (3) for k=1 and {i,j}={5,6}: y(1,5)+y(1,6)≤ x(1)
Apply (4) for k=1 and {i,j}={6,7}: x(6)+x(7)-y(1,6)-y(1,7)≤1-x(1)
Apply (3) for k=1 and {i,j}={7,8}: y(1,7)+y(1,8)≤ x(1)
Apply (4) for k=1 and {i,j}={8,9}: x(8)+x(9)-y(1,8)-y(1,9)≤1-x(1)
Note that since 9 and 1 are neighbors, y(1,9)=0
Sum everything: after cancellations, we get x(1)+x(2)+...+x(9) ≤4.
That's the odd cycle constraint.

What miracle just happened? Instead of going all the way around the cycle, let's just go through the path from 1 to 9 (ignoring the edge between 9 and 1) and sum those inequalities. We obtain:
x(1)+x(2)+...+x(9)≤ 4+y(1,9).
Normally, we know that the independent set can only have at most 5 vertices of a path consisting of 9 vertices, and that can easily be proved from the basic LP, but this tells us more. The variable y(1,9) captures the joint distribution of x(1) and x(9): it tells us that the sum can only be more than 4 when x(9) and x(1) are simultaneously equal to 1. This additional information is what comes into play when we close the cycle: because there is an edge between 9 and 1, x(1) and x(9) can never be 1 together, so we're stuck at 4.

Stronger versions of lift-and-project add positive semi-definite constraints. Why do we care about lift-and-project? Ten years ago, it seemed like a promising venue to reduce the integrality gap of linear programs. That promise has not yet been fulfilled, and it is not so easy to manipulate those complicated, higher dimensional objects, but we are slowly making progress. For example, in the upcoming FOCS Barak, Raghavendra and Steurer use lift-and-project for an algorithm for unique games.

In a graduate course on approximation algorithms, I would want to spend a lecture or two talking about lift-and-project. The above would be a preliminary introduction, but after that, I would want an example of application of lift-and-project that would be simple enough to present in about one lecture. I am wondering: what is the least technical interesting example of using lift-and-project?

1. Great example! Why is this call "lift-and-project"?

2. Going from the n-dimensional space indexed by x(i) for i in {1,...,N}, to the N=(n choose 2)+n dimensional space indexed by the x(i)'s and the y(i,j)'s is a "lif" operation.

I didn't include the "project" operation: project back into n-dimensional space, to get the polytope with variables x(i)'s, defined implicitly by "(x(1),...,x(n)) is in the polytope iff there exist y(i,j) such that the vector defined by the x(i)'s and the y(i,j)'s beglongs to the previously defined N-dimensional polytope.

3. Thanks for the nice write-up! I cover lift-and-project cuts in my graduate course on integer optimization, although the focus in this course is not on proving approximation bounds. I do use the same example of odd-cycle constraints, and go one step further to illustrate how the semidefinite relaxations yield these inequalities in one step. Cutting planes are bread and butter of integer programming (IP), and you may find some illustrative examples to use from related literature. For instance, Burer and Vandenbussche talk about applying lift-and-project to tackle a problem of Erdos and Turan, the marketshare problem, quadratic assignment problem, etc. - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.6150. But they only provide computational results, and do not seem to derive any bounds on approximation - which are probably what you need. You might still get some ideas, though.

4. Thanks Bala! That paper has nice problems, but as you say, no approximation results on those problems.

5. Sherali-Adams plays nicely with bounded treewidth, and the related approximation schemes:
http://dx.doi.org/10.1016/j.disopt.2004.03.002
http://dx.doi.org/10.1007/978-3-642-03685-9_20

6. Dave: thanks for the two references. I guess that there is also the work on knapsack PTAS by Karlin, Nguyen and myself. Perhaps I should look at all three and decide which one has the best combination of technical elegance and simplicity on the one hand, strength or interest of the result on the other hand.

7. It seems to me that in most analysis of the lift and project schemes one only uses the fact that after r rounds the variables when restricted to those r variables lie in the convex hull of the integer hull for those r variables - for example in the case of independent set, for each set of r vertices the variables for those vertices are constrained to lie in the independent set polytope of the graph induced on those vertices. Isn't this easier to motivate and describe the relaxation? You show above that more general stuff is also implied which is nice but the algebraic machinery is a bit hard to grasp in one lecture, I think.

In my course I talk about the relaxation for multiway cut due to Calinescu-Karloff-Rabani. They show in their paper that the geometric relaxation is equivalent to the strengthening of the standard distance LP via some basic valid inequalities.

Knapsack cover inequalities are also nice to discuss.

Chandra

8. Chandra, yes, I hinted at that when I wrote in the blog post that x(i),x(j),y(i,j) can be interpreted as a joint distribution. That's a basic intuitive property of Sherali-Adams. But the long-range consequences, in my mind, are much more interesting (as well as the consequences of adding SDP constraints).

I agree that the algebraic machinery is a big obstacle to teaching a nice lecture. Somehow we have to make it so that intuition is never lost even as we go through algebraic manipulations.

Relating this to the Calinescu et al results is an interesting idea.

9. Thanks for the great and fascinating post!

Is this related to the techniques of linearization and re-linearization (Kipnis & Shamir, CRYPTO 1999), used for solving a system of quadratic equations over some finite field?

Suppose we have a system of c n^2 quadratic equations in n unknowns, x_1,..,x_n. Linearization is that idea that you can introduce C(n,2) new unknowns y_{i,j}, where y_{i,j} stands for the product x_i x_j. Rewriting the equations, we obtain a system of c n^2 linear equations in the unknowns y_{i,j}. This works if c >= 1/2.

Relinearization generalizes, to solve the system of equations in polynomial time for any fixed constant c>0. The idea is to take the c n^2 linear equations in the y's, and introduce some quadratic equations. In particular, for each i,j,k,l, we introduce the equations y_{i,j} y_{k,l} = y_{i,k} y_{j,l}, to represent the fact that (x_i x_j) (x_k x_l) = (x_i x_k) (x_j x_l). In this way, we obtain about n^4/12 quadratic equations in the n^2/2 y-variables, in addition to the c n^2 linear equations. We can now apply linearization to this system of equations. One round of relinearization lets us solve a system of c n^2 quadratic equations in n unknowns, as long as c >= 0.1 or so. (More rounds can lower the constant further, at the cost of an increase in runtime.)

Is this at all related to lift-and-project, or is it a different idea?

P.S. Your blog doesn't accept comments from people who have their browsers set to block third-party cookies. This is privacy-unfriendly. I encourage you to choose a blogging host that is more friendly to readers' privacy.

10. I am a fresh graduate student and would like to understand these hierarchies. I know basic linear programming, primal dual algorithms in approximation algorithms and SDP Max cut algorithm by Goemans & Williamson.
I need your suggestion on which papers should I start reading and in which sequence?
It will be a great help and I really appreciate it.

Regards

11. Monique Laurent has a good perspective. It's technical, but that's how it is for these things. See for example http://www.win.tue.nl/~nikhil/hierarchies/hierarchy_seminar.pdf