tag:blogger.com,1999:blog-40681836987476231132024-03-13T01:43:25.578-04:00A CS Professor's blogClaire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.comBlogger24125tag:blogger.com,1999:blog-4068183698747623113.post-45787436391519467362012-06-22T11:14:00.000-04:002012-06-22T11:14:00.428-04:00One triangulation = three treesAt AofA I saw a talk by Sarah Miracle who presented joint work with Dana Randall, Amanda Streib and Prasad Tetali, showing rapid mixing, for a planar triangulation with maximum degree 6, of a natural Monte Carlo Markov Chain to uniformly generate a random partition into three trees rooted at the 3 outer vertices. This reminded us of the following elementary but striking structural property of planar triangulations.
<p><b>Theorem</b>
The internal edges of a planar triangulation can be partitioned into three trees that span all inner vertices and are rooted respectively at each of the three vertices incident to the outer face.
<p>
The proof is elementary and algorithmic, by induction on the number of vertices.
<p>Algorithm
<br>If G is a single triangle,
<br>Then we are done.
<br>Else
<br>  Find an edge {u,v} with u on the outer boundary, v internal, and exactly two paths of length 2 from u to v.
<br>  Contract edge {u,v} into a vertex u', creating a contracted graph G'.
<br>  Recursively solve the problem for G', obtaining three trees, blue, white and red, where, say, the blue tree is rooted at u'.
<br>  Uncontract edge {u,v}, color it blue; other edges inherit the color of the corresponding edge of G'.
<br>
<p>Analysis
<br>The only question is: does there always exist an edge {u,v} with the specified properties? The answer is elementary: For u, pick any of the three outer vertices. Let u1,u2,…,uk be the neighbors of u (u2 and uk are the other two outer vertices), and consider the cycle u1-u2-…-uk-u1. Any planar cycle on k vertices has at least two "ears" (vertices from which no chords originate), and only one of them can be u2 or uk, so for v we simply take the other ear.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com0tag:blogger.com,1999:blog-4068183698747623113.post-88593613865322077902012-06-21T09:19:00.000-04:002012-06-22T08:20:30.523-04:00Tom Trotter's view of Combinatorics<i>Other mathematicians compete about the complicated theorems that they are<b> able</b> to prove; combinatorialists like to compete
about about the simplicity or self-evidence of statements they are <b>unable</b> to prove, i.e., ``I can't even prove ...''</i>
<br>   [As recalled and paraphrased by Mike Saks]Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com4tag:blogger.com,1999:blog-4068183698747623113.post-30027876630812600182012-06-20T06:14:00.000-04:002012-06-20T06:30:30.110-04:00Avi at AofA: population recoveryFans of "The Land Before Time" know that there are many kinds of dinosaurs: Apatosaurus, Tyrannosaurus, Triceratops, Saurolophus, Pteranodon, Stegosaurus, to mention a few. But is the movie an accurate representation of reality? Were they really in equal numbers, or were there, perhaps, more Triceratops dinosaurs? Enquiring minds want to know: what was the distribution of the various kinds of dinosaurs?
<p>Here is a way to bring the problem into the more familiar realm of mathematics. Think of each dinosaur as a vector, and of each bone as a coordinate. Paleontologists unearth random dinosaur bones, identifying them as they go (somewhat unreliably: errors are not unheard of), and digs can be viewed as "queries". How can we mine the resulting data to recover the distribution of the dinosaur population? That's what Avi Wigderson talked about on Monday at the Analysis of Algorithms conference (AofA) here in Montreal, presenting joint ork with Amir Yehudayoff. (If you don't care for dinosaurs, think "Netflix" instead.)
<p><b>The population recovery problem</b>: Given k n-dimensional binary vectors, recover a hidden distribution (pv) that draws vector v with
probability pv. The hidden distribution is accessed via lossy, noisy queries. Each query
draws a vector v with probability pv, and reveals a lossy and noisy version
of v: lossy, meaning that the algorithm only sees a random 1 percent of the coordinates; noisy, meaning that each coordinate seen by the
algorithm is flipped with probability 49 percent.
<p>The result: Avi gave an algorithm that estimates (p1,p2,…,pk) so that, (with
high probability) for every v the estimate for pv is at most an additive
error epsilon away from the correct value. The number of queries is
polynomial in n and in 1/epsilon (but slightly super polynomial in k).<br>
<p>Here is the algorithm:
<br>(0) Take a few queries.
<br>(1) Find, for each vector v, a small set Sv of coordinates, such that the following condition holds:
<br>   (*) Sv uniquely distinguishes v from every vertex u such that |Su|≤|Sv|.
<br> Let Ambiguous(v) be the set of vectors u that are indistinguishable from v by looking at just the coordinates in Sv.
<br>(2) Label the vectors 1,2,..,k by order of non-increasing |Sv|
<br>(3) For v = 1 to k
<br>   Let zv = the restriction of v to the coordinates in Sv.
<br>   Compute qv, the fraction of the queries whose Sv coordinates are (roughly) zv.
<br>   Let (**) estimate(pv)= qv- ∑ estimate(pu),
<br>    where the sum is over every u in Ambiguous(v), other than v itself, of course
<br>Output (estimate(p1),estimate(p2),…,estimate(pk)).<br>
<p>The first question that comes to the mind of the eager coder is the risk that the code may generate some "undefined" error: isn't there a risk of circularity? Step (3) computes estimates using estimates, but what if the estimates used have not yet been computed? <br>
It turns out that this is not an issue. Why? Because, in step (3), the sum is over u's that are in Ambiguous(v). But by condition (*) we must have |Su|>|Sv|, and by step (2) they are been placed before v in the ordering, so estimate(pu) has already been computed by the time we need it in the right hand side. Hence, no circularity.
<p>The second question is that estimates have errors, inevitably, and when using the algebraic expression (**), those errors will accumulate. To make up for that, more queries will be needed. Now we are worried: in order to have reliable estimates, isn't this algorithm going to require lots and lots of queries? Well, no. The main idea is that the sets Sv constructed in step (1) are small: they all have cardinality at most log(k). Then the graph that has an arc (u,v) whenever estimate(pu) occurs in the right hand side of (**) is a partial order with depth at most log(k), and so errors don't accumulate.
<p>The main remaining difficulty is how to achieve (*) while maintaining sets Sv of cardinality at most log(k). Here is how to do it. <br>
<p>Algorithm for Step (1):
<br>Start with the empty set for every Sv
<br>Repeat
<br>   While there exists an Sv and a coordinate i such that adding i to Sv decreases |Ambiguous(v)| by a factor of 2
<br>     Add i to Sv
<br>   If condition (*) is violated by some u and v,
<br>   then replace Su by Sv
<br>Until condition (*) holds.
<p>Why does that work? By some elementary magic.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com0tag:blogger.com,1999:blog-4068183698747623113.post-65402108403259101112012-06-18T17:14:00.000-04:002012-06-20T06:14:53.790-04:00An interview of Avi WigdersonQ: Do you consider yourself a Mathematician or a Computer Scientist? <br>
A: Both. It is very clear that Theoretical Computer Science is a
mathematical field. We prove theorems. The reason why we are in Computer
Science is that the main mathematical object that we study is computation.<br>
Q: How did you get interested in this?<br>
A: In college, in computer science, the best teachers I had were in
theoretical computer science, for example Shimon Even. I was also into Math
beforehand.<br>
Q: Why did you decide to do a PhD? <br>
A: I didn't come from an academic background, but all the good students in
my class went on to do a PhD, so I followed their example. Once, as a PhD
student, I realized what research was, I knew that that was what I wanted
to do. Dick Lipton, in Princeton, was my advisor. He's the most amazing
guy. He was changing his field of interest almost every month, and I was
following along and learning many things.<br>
Q: What was your first research project?<br>
A: An efficient algorithm to sqrt{n}-color 3-chromatic graphs.
I also worked on succinct graph representations,
cryptography, VLSI layout and parallel computing
in collaboration with other
PhD students.<br><br>
Q: How do you work with your graduate students?<br>
A:
In the last decade I have been working mostly with postdocs, so I'll talk mainly about my time at the Hebrew University.<br>
I don't let them wander off.
I keep in constant touch with them.
I started off
as a young faculty member at the Hebrew University
with
an excellent
group of 4 graduate
students. We'd meet and talk. I like talking, as you know. They were into
different types of problems,
and we often learned an area together.<br>
Q: Would you have advice for first year graduate students?<br>
A: I would advise them to do things that they really like, things that
they develop a passion for. There are students who try to find "hot"
topics, others who try to find topics on which there won't be the stress of
competition, but those are secondary reasons.<br>
Q: How do you get your beginning students started with research?<br>
A: I give them a bunch of papers in different areas that they might be
interested in. Then I ask them: What did you feel? What are you most
attracted to?
Then we discuss the ones they liked and try to think of related problems
in these areas.<br><br>
Q: What makes a question a "good" research question?<br>
A: I like a technical challenge, when the usual attempts don't work.
More precisely, I like problems for which natural-to-apply techniques
don't provide the answer.
Then there is "taste".
Taste
is hard to define. I am
usually
motivated by some higher goal of
understanding, so that the question is part of a journey towards some
bigger goal to which it is potentially related. Sometimes I also simply
develop trust in the taste of my friends. You know that, of course: if,
say, Alistair Sinclair comes to you with a problem to work on, you'll be
happy to work on it even if you yourself do not quite see
immediately why he likes it or
the larger
context in which it fits.<br>
Q: What is your favorite result in your own work?<br>
A: Zero-knowledge. It's so much fun explaining it to people. It's very
rewarding.<br>
Q: Practical impact. What would you say about that?<br>
A: It's not a strong motivating force for me. Many of the questions I
study are fundamental in nature. The intellectual challenge is my primary
motivation.
Of course, in our field, I believe there is a strong correlation between
fundamental questions and practical impact (eg theoretically defining
and understanding computationally such basic notions of "secret",
"knowledge", "randomness", "learning", "proof" yielded great benefits,
both intellectual and practical)<br><br>
Q: How much time do you spend reading papers?<br>
A: When I was a student, I devoured journal papers. Now, not so much.
After hearing a lecture, I read the paper when I want to learn a technique.
Sometimes I go to a math textbook to learn a
certain topic which may be useful to a research project.<br>
Q: Do you have role models?<br>
A: I had Dick Lipton, when I was a PhD student, of course. I also have
great admiration for some people who have developed
completely
new models
and notions, and thereby new areas of research
that have
changed the field: Valiant or Blum for example.<br><br>
Q: You've been in the field for roughly 30 years. How has it changed?<br>
A: It's much larger in many ways. The number of areas we study has
increased a lot. Learning didn't exist. In crypto, Goldwasser, Micali and
Yao had just started. Distributed computing was very limited.
Few
Probabilistic
algorithms
existed. Connections with
statistical physics ideas,
Monte Carlo methods, all that was not there. Mechanism design
and other connections to economics
did not exist. A
large part of the growth was due to connections with other fields. There
were also developments with subfields of Mathemetics such as analytic
number theory, group theory, pseudorandomness...<br>
A: The other way in which it has changed is the recognition by the
mathematical community of the importance of the field, that just blasted.
We are generating great questions for mathematicians, and they love it.
They recognize that they are fundamental questions. They have to rethink a
lot of what they're doing and use the computational lense.<br>
Q: So, students have to be much more mathematicaly sophisticated?<br>
A:
Yes,
but they can learn what they need. There are fantastic people in the
field. The young people entering the field have phenomenal talent,
both technically and conceptually.
TCS is
a primary field of interest for the best mathematically talented young
people to go in.
It is a great sign for the health of a scientific field when it attracts
such talent.<br>
Q: I actually find it a bit intimidating. How to advise students who are
so talented?<br>
A:
I am not intimidated by talented people, young or old - they are great
to learn from.
When advising young people, age provides
the advantage of experience.
Sometimes
we can predict which directions will be fruitful and which
directions that will turn out to be dead ends (although we're sometimes
wrong),
sometimes
we can tell which questions people will like (although we are
sometimes wrong).
But youth provides energy and fresh thought which can often be more
important than experience and is always wonderful to be around. Our
profession is among a few that allows real constant interaction with
young people, and to me this is a great aspect of academia.<br>
Q: Concretely, how did you go about attracting students to theoretical
computer science?<br>
A: Here's one method. I teach a graduate course in computer science, I
open it to undergraduates, and I advertise it in the Math department.
That's how I got Ran Raz, for example.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com4tag:blogger.com,1999:blog-4068183698747623113.post-72399089285112358412012-05-28T10:17:00.000-04:002012-06-20T06:15:06.442-04:00An interview of Maurice Herlihy<p>Q: You just received the Dijkstra prize for the second time. What did you receive this award for?
<br>A: The Transactional Memory paper. The basic idea is that designing highly concurrent data structures is much too hard and people who work in that area have to focus on too many tricks. Mechanical tricks are too easily confused with essential issues, and that paper was an attempt to make it easier to deal with the mechanics, to allow researchers to focus on the core aspects of synchronization algorithms.
<br>Q: What's the result?
<br>A: Hardware designing.
<br>Q: What do you mean?
<br>A: A way of designing cache coherence protocols to allow the programmer to avoid conventional mechanisms such as locks. You can say what it is that you want to be atomic, without specifying how. It's an implementation of a very simple abstraction. The paper was very concrete, with hardware architecture.
<br>Q: How did it have impact?
<br>A: The paper is from 1993. For the first 10 years, nobody paid any attention to this paper. There was an average of 5 or 6 citations per year. Then in 2004 changes in technology meant that processor clock speeds could no longer improve, so people went to multicore where there is a lot of concurrency, and then they discovered that no one had any idea how to use them.
<br>Q: You make programing multicore machines easier to do and to reason about?
<br>A: It has other software engineering benefits, some performance benefits. The entire story is complicated but big picture is that it makes it much easier to write concurrent programs.
<br>Q: Can you give an example of a concrete impact?
<br>A: The Intel Haswell processor, their next processor, has transactional memory built into it. Intel laptops from now on have a machine that supports transactional memory in a form not much different from the 1993 paper.
<br>Q: Do you have a patent?
<br>A: I do but it's not mine. I gave it to my employer who was DEC. The patent may be expired.
<p>Q: What did you think of it when you were doing it?
<br>A: A very nice idea solving an important problem, but I knew no one would care about it. I was convinced that it could become a very important idea, and at the same time I was very surprised when it did.
<br>Q: Do you have any other papers waiting to be discovered?
<br>A: [...]
<br>Q: How did you think about it?
<br>A: It came up in a conversation with Eliot Moss, coauthor on the paper. I was interested in lock-free data structures and he was interested in architecture and we were talking about how difficult it was to do lock free data structures on modern architecture and Eliot asked whether one could use cache coherence protocols to address the problem.
<br>Q: This piece of research was instantaneous?
<br>A: We had been office mates as grad students at MIT before, so we would meet from time to time casually for conversation, so this is an example of why it's a good idea to talk to people from other areas. What made it so unusual at that time was that it was a linkage between concurrent data structures (very high level) and hardware design (very low level). It required a broad CS culture.
<br>Q: What does that say about how to do research?
<br>A: You need to have conversations with people who are not goal directed. Neither one of us working individually would have done this. We knew something about one another's areas but would not have concentrated enough to see this connection.
<p>Q: Do you have any opinion of the streaming computational model?
<br>A: It's something that has gone, in a very short time, from ad hoc to methods that are much better founded in theory.
<br>Q: What about MapReduce?
<br>A: It's been enormously successful, but it's very limited in what it can do, and everybody is trying to invent or discover the successor to MapReduce.
<br>Q: Does it mean it's not worth investing a lot of time into it, say, designing algorithms for it?
<br>A: Because MapReduce is so important, any improvement will have immediate impact, but it's something done better as a large company, in industry rather than in the academic world. There 's also a lot of engineering that goes on in this area.
<p>Q: How do you go about doing research?
<br>A: I am opportunistic. I have a general area of interest, namely, concurrency, but I worry about becoming too specialized, so I make an effort to balance my attention between more applied and more theoretical areas, because, often, that's where you have the opportunity to write the first "easy" paper and let the ones who write the difficult papers cite your paper.
<br>Q: So you're a pioneer?
<br>A: Maybe more of a vagabond.
<br>Q: Are you ever tired of research?
<br>A: No.
<br>Q: Do you do research on vacation?
<br>A: Yes, in my head.
<br>Q: Do you always have some problems you're working on?
<br>A: Yes, usually something that's good for in the shower, or for riding on the metro.
<br>Q: Usually, do you work on several problems at once, or focus on a single one?
<br>A: Usually several small problems at once, because when you get tired of one, then you can turn to the others.
<br>Q: What are your current interests?
<br>A: Right now, I'm writing a book on combinatorial topology and distributed computing, and I'm also interested in systems aspects of multicore. Since almost nobody is interested in both of these topics, it means that I can talk to twice as many people!
<br>Q: Where are your collaborators?
<br>A: Almost everywhere except at Brown. I'm writing the book with one coauthor in Mexico and the other in Germany. I am visiting several collaborators here in Paris. I work a day a week at Oracle labs in Boston.
<p>Q: Distributed computing is not taught much in the undergrad curriculum. What do you think of that state of affairs?
<br>A: I think that that will inevitably change. What will be taught will be multiprocess computing, because soon every computer will be multicore. Today I think that it is a little bit of a scandal that we don't teach parallel computing to our undergrads until late.
<br>Q: Are you familiar with the CLRS textbook on Algorithms?
<br>A: There is a chapter on parallel computing but it's an area that changes rapidly.
<br>Q: So, what about distributed computing?
<br>A: The problem with distributed computing is that there are many models of computation. We're trying to capture many different kinds of phenomena. The culture isn't one where people think standardization is important. Every time someone writes a paper, there is a tendency to use a slightly different model and we're not organized enough as a community to present a coherent story to the world. We're a little like European parliaments, we split into many small groups and argue with each other instead of presenting a unified front.
<br>Q: I take it that that's a criticism?
<br>A: It's an observation. This situation is exactly the reason why I'm writing the book. At present, when you work in this area, you have to read a dozen badly written papers. That presents a large barrier to students and outsiders, and the hope of the book is to present a single coherent story, at a level that's accessible to advanced undergraduates.
<br>Q: When is it coming out?
<br>A: End of this year?
<br>Q: Have people been teaching from drafts of this book?
<br>A: No, people teach from drafts of my other book, a much more applied textbook.
<br>Q: What's your community, the conference where you feel most at home?
<br>A: PODC and DISC, which are basically the same conference but on different continents.
<p>Q: What got you interested in distributed computing?
<br>A: As a grad student at MIT, Barbara Liskow had an opening (physical opening: a chair) in one of her offices, so I drifted into her group. Who you sit next to is perhaps the most important influence in your life. I was floating around looking for a group and the other groups on that floor weren't very friendly, but Barbara was happy to talk to a new student.
<br>Q: Why did you go to MIT?
<br>A: Because I was already living in Cambridge and my girlfriend didn't want to move.
<br>Q: Why did you go to grad school?
<br>A: I got a degree in pure math and I wasn't sure what I wanted to do and at the end of summer after I graduated I needed to start paying rent so I applied at random for jobs. I had one job offer to teach math at a boys school in Michigan, one to write Fortran programs for the DOT in Cambridge, then I went from there to working at a research group at the MIT business school, but one day they lost their grant money. They didn't fire me but they fired the guy who did the tape backups and told me I now had to do tape backups, and so I applied to grad school.
<p>Q: What's your biggest regret in your career?
<br>A: I think it's being slow on the intake. There are many ideas that turned out to be successful that should have been obvious much sooner. I could have gone to grad school right away instead of working as a Fortran programmer for 3 years.
<br>Q: That was a waste of time?
<br>A: Not really because, whenever I get frustrated, I remember that time and realize how fortunate I now am.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com5tag:blogger.com,1999:blog-4068183698747623113.post-61932012930893523792012-02-15T12:16:00.005-05:002012-02-15T22:21:19.421-05:00The year of the TSPWhen I was at ITCS, last month, I talked to Huyng-Chan An who told me about his recent work with David Shmoys and Bobby Kleinberg (to appear in STOC 2012).<br /> <br />They study the s-t path variant of TSP, that asks for the shortest path going from a pre-specified start vertex s to a pre-specified end vertex t while visiting all other vertices along the way. What is the motivation for this problem? Well, imagine that you are a postman delivering letters. You pick up letters at the post office, deliver them, and when you're done, you go home: you have just built an s-t path, with the post office as s and your home as t. Or, imagine that your building is on fire, it is early in the morning, and the fire alarm is not working. You want to leave your office, visit all offices in the building as quickly as possible to tell anyone who might be there already that they must leave, and then you want to get out. No way you want to go back to your starting point in the end!<br /><br />Christofides' algorithm, adapted to that variant, is not a 3/2 but a 5/3 approximation in the metric setting. An, Kleinberg and Shmoys have a very simple algorithm that improves the 5/3=1.66… to 1.61… The funny thing is that I did not know about the 5/3 approximation until An told me that they had improved upon that twenty years old bound! For that reason, I am hesitant to call it a breakthrough, and also because it comes along with several recent papers making progress on special cases of TSP. As a friend recently suggested, this is shaping up as "the year of the TSP"! But I looked at their result in more detail, and it is a clean, more or less self-contained, clever application of modern techniques. That's what makes it appealing. <br /><br />What's the algorithm? <br /><br />Nothing special, really:<br />(1). Solve the natural variant of the Held-Karp linear programming lower bound for TSP<br />(2). Express the solution as a convex combination of spanning trees (that's known to be possible by polyhedral combinatorics)<br />(3). Pick a random such tree T* according to the distribution given in (2), and <br />(4). use it in an execution of the natural variant of the Christofides heuristic.<br />The fact that it is "nothing special", in my view, is a good thing. It means that it can be implemented at little additional cost. (The difficulty is in the analysis.) <br /><br />What's the "natural variant" of the Held-Karp constraints? Some constraints are different: every s-t cut must be crossed by a minimum of just one edge (instead of a minimum of two edges, in the usual version). <br /><br />What's the "natural variant" of the Christofides heuristic? Instead of finding a perfect matching of the odd degree vertices of T*, it finds a perfect matching (T-join) of the "off-degree" vertices of T*, that is, the vertices whose degree has the wrong parity: the difference with the usual setup is that now, vertex s has the wrong parity if its degree in T* is even, and similarly vertex t has the wrong parity if its degree in T* is even. <br /><br />So, that's the algorithm.<br /><br />What's the analysis? <br /><br />First, they present an elegant proof of the 5/3 approximation result, then modify it. Because of that, we first have to understand why the algorithm is a 5/3 approximation. The tree T* from step 3 has expected cost bounded by OPT, so the sole focus is on bounding the cost of the T-join. That is achieved by bounding the value of a particular fractional T-join. What's a fractional T-join? It's a feasible solution to the following LP:<br />* for every edge e, variable y(e) must be between 0 and 1<br />* for every proper cut (S,V-S), if S contains an odd number of "off-degree" vertices, then at least one edge crosses the cut - or in other words, fractionally, the sum, over every edge e crossing the cut, of y(e), must be at least 1.<br />As it happens, that polytope has integer vertices, so the optimal (integer) T-join is also an optimal fractional T-join, and so the cost of the T-join built by the algorithm in step (4) can be bounded by the cost of any fractional T-join. The proof exploits this freedom to carefully choose a particular fractional T-join whose expected cost is less than 2/3 OPT. How can it prove that the cost is less than 2/3 OPT? By proving that it's less than the 2/3 times the Held-Karp lower bound. So, the proof has now neatly moved to the realm of polyhedral analysis, and the rest is now a matter of manipulating vectors y=(y(e)).<br /><br />The 5/3 result is proved by choosing, for our fractional T-join, a scaled version of an average of two solutions that each achieve the Held-Karp lower bound, namely: the fractional solution x* obtained in step (1), and the tree T* obtained in step (3). Indeed, x* satisfies all the fractional T-join constraints, but in addition, observe that there are some constraints where it has slack: for some cuts, the T-join constraint requires a weight of 1 across the cut, but x* guarantees a weight of 2 across the cut. The tree T* also satisfies all the fractional T-join constraints, but in addition, observe that there are also some constraints where it has some slack: for some cuts, the T-join constraint requires a weight of 1 across the cut, but T* guarantees a weight of 2 across the cut. Really? Well, yes: if S separates s from t and contains an odd number of "off-degree" vertices, then a straightforward counting argument shows that at last 2 edges of the tree must come out of S. Now, the funny thing is that T* has slack precisely for every constraint where x* does <i>not</i> have slack! So if we take the average of x* and of T*, for each constraint of the fractional T-join, we'll have (1+2)/2=3/2 edges crossing at least, and so, if we scale by 2/3, it's still feasible! That shows that y=(2/3)(x*/2+ T*/2) is a feasible fractional T-join. To recap, its cost is at most 2/3 times the average cost of x* and of T*, which is at most 2/3 OPT. The T-join built in step (4) has cost less than that, and that's how they get a 1+2/3=5/3 approximation.<br /><br />How do they improve on this analysis? By fiddling with the factors cleverly. To get a fractional T-join with cost slightly better than 2/3, instead of x*/3+T*/3, they pick y=(1/3-2c)x* + (1/3+c)T*+(correction vector for the fractional T-join constraints that are now violated), and hope that the cost of the correction vector is less than c OPT. Realizing that hope requires a careful analysis of the violated constraints. They observe that<br />* even if constraints are violated, they're <i>not violated by much</i>: the sum across the cut is still at least 1-O( c). The deficiency is O( c).<br />* given that T* is drawn at random in step (2), those cuts are <i>rare</i>: an elementary argument uses Markov's inequality to show that their probability is O( c).<br />So the correction vector z will simply have z(e)=deficiency times Pr(e in cut where the constraint is violated) times x*(e)=O(c^2)x*(e).<br />Then the cost of the fractional T-join y is (2/3 - c + O(c^2))OPT, which is less than 2/3 for c small enough constant.<br /><br />There's an additional technicality: the above reasoning is buggy when the violated constraints correspond to non-disjoint cuts, but they show that they can make the cuts disjoint up to introducing another small factor 1-O( c). And with that, they beat the 5/3 approximation. (There's more to the analysis to refine it and get further improvements, but that's beyond my attention span...)<br /><br />What's the take-home message from that proof? Perhaps, it is a hymn to the power and flexibility of LP feasibility. The technique seems general. They pushed it through for the s-t path TSP problem, but why not use it for other problems, whenever there's an integer polytope behind the scenes?Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com2tag:blogger.com,1999:blog-4068183698747623113.post-67257939735658075902012-01-06T05:12:00.000-05:002012-01-06T05:12:03.092-05:00Push-pullI just saw a talk by Thomas Sauerwald analyzing the push-pull protocol for broadcasting in graph-theoretical models for social networks. If you ask your search engine for push-pull, many answers come up, related to aviation, physics, flirting, and marketing. But this is different.<br /><br />To broadcast information, in the <i>push</i> protocol, at every round every node that has the information rudely shares it with a random neighbor, whether that neighbor wants it or not. In the <i>pull</i> protocol, at every round every node that does not yet have the information extracts it from a random neighbor, if that neighbor has the information. In the <i>push-pull</i> protocol, both behaviors occur. <br /><br />Q: How many rounds before 99% of the nodes are informed? <br />A: log log n if the network is a Chung-Lu random graph model for social networks (parameterized such that the average distance between two nodes is log log n.) That's the result.<br /><br />Q: What's the Chung-Lu model?<br />A: each node i has a weight w(i), and edge {i,j} is in the network independently with probability min(1,w(i)w(j)/W), where W is the sum of all weights. w(i) is such that the distribution of degrees follows a power law (with parameter beta).<br /><br />Q: How about informing not just 99% but 100% of the nodes?<br />A: the diameter is roughly logarithmic, so, there is no way that you can inform 100% of the nodes in time log log n. In a social network, when something goes viral, 1% of the crowd will remain clueless in spite of your efforts. The solution: just forget abut them. We're the 99%!<br /><br />Q: What's the rough idea of the proof?<br />A: Partition nodes into classes according to (the log log of) their degree, (rounded to the nearest integer.)<br />0. The information initially belongs to a random node.<br />1. After log log n rounds it reaches a node in the highest class.<br />2. After a few more rounds it reaches (50% of) the nodes in the highest class.<br />3. After another log log n rounds it goes from there to almost all network nodes.<br /><br />Step 2 contains some intuition: In the Chung-Lu model, two nodes x and y of high class (i.e. high degree) are probably connected by a path of length 2 via an intermediate node z of degree 2. If x has the information, then z will pull it from x after a couple of steps, and then z will pass it on to y after a couple more steps, so nodes of degree 2 transmit information in a constant number of rounds. In other words, small degree nodes are good for passing on information quickly to everyone they know. So what matters in the large degree nodes is really their degree-2 neighbors: those will immediately pull the information and share it across to their other neighbor, thus roughly reducing the GOSSIP mode of communication to a LOCAL mode of communication.<br /><br />Q: Is the push-pull protocol realistic?<br />A: It's not inconceivable. Pushing corresponds to, say, sending an email to a single (or a few) recipient(s) to share information: "Have you heard? Abe and Amy broke up!". Pulling corresponds to asking your friend: "How are Abe and Amy doing? I haven't seen them lately" or checking their Facebook page, and then, if you find the piece of information which you were waiting for "Status: single", adding a note about it on your own Facebook wall: "Hey everyone, look, Abe is no longer in a relationship. He's single! How exciting!". (I am having trouble imagining why someone seemingly so interested in that piece of information would want to broadcast it to the rest of the world, but if that's the only reason why the protocol is not perfectly realistic, I'll give it a pass.)<br /><br />Q: Is the Chung-Lu model realistic?<br />A: Maybe since it's a reasonably good fit with reality according to several statistics. But there is no way to know for sure, is there?<br /><br />Q: Is the above log log n result robust? Does it still hold if the push-pull protocol or the Chung-Lu network is replaced by something similar but different?<br />A: The proof presumably breaks down. The result might still be true if the graph has small average distance and if it has some small degree nodes adjacent to large degree nodes.<br /><br />Q: Is the synchronous assumption realistic?<br />A: In the asynchronous model in which each node has a Poisson clock, the log log n becomes a 1. That's because x and y don't just have a path of length 2 going through node z: they have many such paths going through z1, z2, …, and one of them goes through a node whose Poisson clock rings twice in rapid succession: "Zap! Get that info from x! Zap! Tell y about it, quick, to be the first one!" and transmits information almost instantaneously. So we don't mind if synchronicity does not hold, quite the opposite!<br /><br />Q: What about other graphs?<br />A: In general graphs, the idea of reducing GOSSIP to LOCAL has been explored by Kelner, Censor-Hillel, Haeupler and Maymounkov. The two modes differ by at most an additive polylog(n).Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com0tag:blogger.com,1999:blog-4068183698747623113.post-43932816505822974692011-12-11T07:13:00.020-05:002011-12-12T09:02:43.466-05:00Outdegree, contractions, planar graphsYou can always direct the edges of a planar graph so that every vertex has outdegree at most 5. That fact is clear to anyone who has taken a discrete math course at one point in their life.<br /><br />My question is:<br />Q: What happens if the edges of the graph get contracted one by one? <br />A: Then the outdegree risks growing. For example, if one contracts two adjacent vertices that each have outdegree 5, the resulting vertex will then have outdegree 5+5-1=9. The next contraction may bring its degree up to 9+5-1=13, then 17, etc. <br /><br />My question then is:<br />Q: Is it possible to efficiently modify the orientation of some edges along the way and prevent the outdegree from blowing up? <br />A: Yup. Here's the algorithm, appealingly simple and natural. <b>After each contraction, do the following: while there exists a vertex v of outdegree greater than 10, reverse the orientation of every outgoing edge of v.</b> <br /><br />At this point your question surely is or should be:<br />Q (Toto): Cute, maybe, if that's your taste, but why do you care?<br />A: Because it is used in an upcoming SODA paper by David Eisenstat, Philip Klein and myself, to reduce the runtime of a primal-dual region-growing phase. <br /><br />Now that the motivation is out of the way, we can think about it together.<br />Q (Toto): Ok, I'm willing to temporarily accept that this might be interesting. Then, why does this algorithm work and how efficient is it? <br />A: By design, it obviously maintains an orientation in which each vertex has outdegree no more than 10. The neat observation is that it is an efficient algorithm: the total number of reversals, and hence the runtime (assuming you don't mess up the low-level implementation details), is O(n log n).<br /><br />Q (Toto): Why is the total number of reversals O(n log n)? I don't see it.<br />A: The proof is magic. <br />Q (Toto): I don't like magic.<br />A: The magic would disappear if you did a few examples of amortized analysis, but for a single analysis of a single problem, you have to accept the magic. Let G(i) be the graph after i contractions. We consider two orientations of the edges of G(i): Alg(G(i)), the orientation obtained by executing the algorithm, and a certain orientation Ideal(G(i)) such that the maximum outdegree in Ideal(G(i)) is at most 5. To start the proof, magically define the potential of G(i) to be<br /><center><b>Potential(G(i))= distance(Alg(G(i),Ideal(G(i))),</b></center> that is, the number of edges that are oriented differently in the two orientations. <br /><br />Now, here's my question.<br />Q: What happens to the potential when the algorithm reverses all the outgoing edges of a vertex v?<br />A: That's not hard to see. Let S denote that group of edges. Immediately before the reversal of S, in Alg the outdegree of v is |S|>=11, and in Ideal the outdegree is <=5, so S contributes at least |S|-5 to the distance; immediately after the reversal of S, S contributes at most 5 to the distance. So the reversal of S has caused the distance to decrease by at least (|S|-5-5)>=1. Overall the total number of edge reversals is within a constant factor of (sum of decreases in the potential function): <b>instead of analyzing the number of edge reversals, we can just analyze the number of decreases of the potential function</b>. <br /><br />Q (Toto): Big deal. It's not clear to me that you have made any progress: how do you analyze the number of decreases of the potential function?<br />A: Let's see. Since the potential function is always non-negative, each decrease must be compensated by some previous increase (or by the initial value). So, the total number of decreases is at most the initial value of the potential (which is O(n)), plus the increases that may happen when going from Ideal(G(i)) to Ideal(G(i+1)).<br /><br />Q (Toto): But you haven't even told me what that "certain orientation Ideal(G(i))" was! I am frustrated.<br />A: Patience! It is now time to define Ideal(G(i)). Let's do it in backwards order.<br />Q (Toto): Why? I am frustrated.<br />A: Because that's what works. Be quiet! Start from the final graph, give it an orientation such that the maximum outdegree is 5, and "uncontract" the edges one by one. To define Ideal(G(i)) from Ideal(G(i+1)), do the following. After the uncontraction of edge uv, in the orientation of G(i) that is inherited from G(i+1), vertex u and vertex v each have outdegree at most 5, and edge uv needs to be given an orientation: arbitrarily direct it from u to v; if u now has degree 6, find a shortest path p (in the directed graph) from u to a vertex with outdegree less than or equal to 4, and reverse the orientation of every edge on p. Now the outdegree of u is reduced to 5. That defines Ideal(G(i)).<br /><br />Q (Toto): I'm tired. Are we there yet?<br />A: To finish the proof, it only remains to observe that p has length O(log n). <br /><br />Q (Toto): Why does p have length O(log n)? Oh, whatever.<br />A: Easy. Do breadth first search from u. How can it be that all vertices in the first i levels have degree >=5? From Euler's formula for planar graphs, it is easy to see (it's an exercise) that it can only happen if the growth rate is exponential, and such a growth rate can only be maintained for O(log n) levels. <br /><br />Q (Toto): If you say so. Whatever.<br />A: Hang on! We're done, just about: So p has length O(log n), so Ideal(G(i)) differs from Ideal(G(i+1)) in O(log n) edges, so the potential can only increase by O(log n) at each contraction, so it can only increase by O(n log n) overall, so it can only decrease by O(initial value + n log n), so the total number of edges reversals is O(n + n log n)=O(n log n) overall, so the algorithm is efficient. I'm done. <br /><br />My last question:<br />Q: Isn't it elegant, simple, and clever? <br />A (Toto): I guess so. You certainly seem excited about it.<br /><br />Your final question is:<br />Q (Toto): How do you know all that stuff? <br />A: The basic technique is from Brodal and Fagerberg, "Dynamic Representations of Sparse Graphs." The application to contractions is from Kowalik, "Fast 3-coloring triangle-free planar graphs". It's also explained in an upcoming <a href=" http://flatworlds.org/book/">textbook</a> by Philip Klein.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com5tag:blogger.com,1999:blog-4068183698747623113.post-32322021498194569662011-11-29T10:42:00.009-05:002011-11-29T20:10:01.916-05:00STACS 2012 list of accepted papersThe <a href="http://stacs2012.lip6.fr/">STACS 2012</a> accepted papers are out. STACS is in Paris and starts on February 29.<br /><br />The most noteworthy article for my taste, at first glance, is the <i>13/9-approximation for Graphic TSP</i> by Marcin Mucha, available <a href="http://arxiv.org/abs/1108.1130">here</a>. (In some sense this is old news: the paper will be presented at STACS, but was published on Arxiv in August.) They improve the analysis of the Moemke-Svensson <a href="http://arxiv.org/pdf/1104.3090">paper</a> from last Spring (presented last month at FOCS; was published on Arxiv in April). <br /><br />Here's a quote: <i>We present an improved analysis of the cost of the circulation used in the construction of the TSP tour/path. [...] The circulation used [by Moemke and Svensson] consists of two parts: the ”core” part based on an extreme optimal solution to the Held-Karp relaxation of TSP, and the ”correction” part that adds enough flow to the core part to make it feasible. We improve bounds on costs of both part; in particular we show that the second part is in a sense free. As for the first part, similarly to the original proof of Moemke and Svensson, our proof exploits its knapsack-like structure. However, we use the 2-dimensional knapsack problem in our analysis, instead of the standard knapsack problem.</i>Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com6tag:blogger.com,1999:blog-4068183698747623113.post-88703897053098069332011-11-18T07:17:00.001-05:002011-11-18T07:17:00.185-05:00How to solve a system of linear equationsWhat could be more boring than solving a system of linear equations, Ax=b? We've all learnt in school how to do it: Gaussian elimination; and we've practiced it until we got sick of it. End of story!<br /><br />But there are people who are never happy with what they have. Gaussian elimination is too slow, they whine. Polynomial time is not good enough for them. They are afraid that it will fail on massive data, and want something that is fast "<i>for real</i>". Near-linear time is the goal these days.<br /><br />They currently focus on a special case: "SDD solvers", meaning that the matrix A is symmetric and diagonally dominant, meaning that the value on the diagonal is larger than the sum of absolute values of all off-diagonal values on its row. In fact, let us simplify things further and assume that A is the "Laplacian matrix" of a graph - the matrix where rows and columns are indexed by V, where the diagonal entry is the vertex degree, and where the off-diagonal entry is -1 if and only if there is an edge between the two vertices.<br /><br />So, for that special case, in their anxious search for <i>faster</i> algorithms, people have started playing a game that is the opposite of what we traditionally do in approximation algorithms: instead of giving highest priority to the accuracy of the output, and treating runtime as a second thought, they sacrifice some accuracy in the hope if getting a <i>faster</i> algorithm. Indeed, if we don't insist on finding the exact solution x* of Ax=b, then we can try some iterative method that constructs a sequence x1,x2,x3,., where each xi is an improvement over the previous one, and pray that it gets <i>quickly</i> to a value xT close to x*.<br /><br />But scientists don't trust the power of prayer. Instead, we analyze the <i>runtime</i>, which is basically the cost of each iteration times the number of iterations. Unfortunately, if the matrix A is dense (many non-zero entries), each iteration is costly; and if it's not well-rounded, one needs a large number of iterations to get close to x*.<br /><br />To <i>speedup</i> the algorithm, instead of solving the system Ax=b, let us solve the equivalent system<br /> <center> B^{-1}Ax=B^{-1}b</center><br />for a well-chosen matrix B. Namely, B should be sparse, so that each iteration is <i>cheap</i>, and B should be similar to A, so that few iterations suffice to converge. (What's the measure of similarity? It's k such that for any vector x, x^TBx is between x^TAx and k times that quantity.)<br /><br />Fortunately, there is an algorithm to find a matrix B that is very sparse- only a linear number of non-zero entries - and very well conditioned - k is constant. Unfortunately, that algorithm is <i>slow</i>, which defeats the whole purpose. So, how can one go about <i>quickly</i> finding a good matrix B?<br /><br />Here's an algorithm for that subproblem.<br /><br />1. First, find a low-stretch spanning tree T of G, the graph corresponding to A: it's a spanning tree such that the sum of all interpoint distances in T is not much more than the sum of all interpoint distances in G. That's been studied before in a sequence of papers, so it's just a matter of recycling old work.<br /><br />2. Second, for every edge uv of G, define P(uv) as the distance from u to v in T. Let t be the sum of P(e) over all edges e of G.<br /><br />3. Third, build a multigraph H by repeating t log(n) times: <br />pick an edge e with probability P(e)/t, and put t/P(e) copies of e in H.<br /><br />4. The desired matrix B is (1/t log(n)) times the Laplacian matrix of H.<br /><br /><br />And that is what I learned because of a seminar given at Brown by Richard Peng, who presented some more advanced stuff in the same area (some work he's done with Ioannis Koutis and Gary Miller).<br /><br />Why should we care about this line of research? Getting an improved polynomial-time algorithm for a special case of linear system solvers, is that really such a big deal? Why is there so much interest for those papers in the Theory community? Pick your answer:<br /><br />( a ) . we should care, because Dan Spielman (or insert the name of your favorite researcher in that area) cares, and he has good taste. [Trust of authorities]<br />( b ) . we should care, because the algorithmic perspective on those problems is pretty new. It's the development of a new technique. [Algorithmic design]<br />( c ) . we should care, because this has been implemented and tested and it is claimed to be competitive with existing solvers, so it leads, maybe, to actual real life programs that run faster. This could be a big success story of Theory! [Applications to Applications]<br />( d ) . we should care, because it has implications for Theory: it leads to faster algorithms for graph problems such as generating a random spanning tree or maximum flow. [Applications to Theory]<br />( e ) . we should care, because it's sunny out, I am in a good mood, and today I am inclined to see things as exciting in general. [Random]<br /><br />( a' ) . we should ignore this, because it has too much heavy math, it's not fun, and it's not my taste. [Trust of own taste]<br />( b' ) . we should ignore this, because the core of it is just low-stretch algorithms, sampling, etc: it's just putting together basic algorithmic tools that have nothing new about them, and the claim of novelty are pure marketing [Disgruntled]<br />( c' ) . we should ignore this, because the claims of applications are merely what people always say in their introductions, and those claims never mean anything. [Skepticism]<br />( d' ) . we should ignore this, because it's not going to help understand the P versus NP problem, even in the long term. [Lack of applications to Theory]<br />( e' ) . we should ignore this, because I had too much to eat and now I have a stomachache. [Random]Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com5tag:blogger.com,1999:blog-4068183698747623113.post-3679511463059429222011-11-09T08:47:00.002-05:002011-11-09T12:22:51.893-05:00Spanning trees with few hopsMinimum spanning tree problems often become NP-hard when one introduces side constraints. One popular such constraint is to limit the depth of the tree - the maximum number of "hops" when traversing a path from the root to a leaf in the tree. That's the k-hops minimum spanning tree problem, supposedly motivated by network applications, because delays along a path are roughly proportional to the number of hops. <br /><br />In this problem, unlike the minimum spanning tree problem, the input edge weights do not necessarily define a metric. Indeed, in the minimum spanning tree problem, we usually start by saying: "Without loss of generality we can assume that the triangular inequality holds; otherwise, replace costly edges by shortest paths." But in the k-hop minimum spanning tree problem, such replacements may be invalid because they increase the number of hops. <br /><br />When the input weights do not define a metric, known algorithms are quite poor. If there exists a tree with k hops and weight w, then one can produce a tree with O(k log n) hops and weight O(w log n), but doing better is impossible. <br /><br />However, the situation is much brighter with random inputs drawn from a nice distribution. In a 2008 arxiv paper by Angel, Flaxman and Wilson, one assumes that the input is a complete graph and that the edges are drawn independently from the same distribution, for example the uniform U[0,1] distribution, or an exponential, or some other reasonable distribution (i.e. with similar density function near zero). Then there is a simple algorithm to produce a near-optimal tree. <br /><br />Here is the algorithm.<br /><br />First, assume that the bound k on the number of hops is less than log log n. Then the algorithm guesses x1, x2,..., xk, where xi is the number of nodes that are i hops away from the root. Then it performs a greedy Prim-like algorithm, going level by level, where, to build level i, you pick the xi nodes that can most cheaply be connected to the nodes in the part of the tree already built. (The "guess" relies on probabilistic analysis to compute the optimal sequence x(i)). <br /><br />Second, assume that the bound k on the number of hops is more than log log n. Then the algorithm builds an (unconstrained) minimum spanning tree, then greedily breaks it up into small-depth clusters, then reconnects the clusters using a straightforward extension of the first algorithm, (where the goal is now not to connect the entire graph but to create a small depth subtree intersecting all the clusters). <br /><br />Although the independent weight assumption is quite restrictive, the algorithm itself is natural. In fact, the unexpected bonus is that when k>log log n, not only is the output tree near-optimal, but its cost is almost the same as the cost of the unconstrained MST: in other words, one get tree of depth O(log log n) at no extra cost. So one is left to wonder: how good is this algorithm in practice?Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com2tag:blogger.com,1999:blog-4068183698747623113.post-22792901172597579022011-11-08T07:58:00.006-05:002011-11-08T10:08:25.390-05:00Models for discrete metric spacesIt is good to have random models to test algorithms. <br /><br />For graph algorithms, the G(n,p) model is like an old friend. For network algorithms, the analog is the complete graph with i.i.d. exponentially distributed edge weights. For algorithms on discrete metric spaces, what's a good model for a random discrete metric space?<br /><br />Is there a model such that the marginal distribution of the distance between any pair of vertices has an exponential distribution? If so, then what would be a maximum entropy distribution that would have those marginals?Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com8tag:blogger.com,1999:blog-4068183698747623113.post-34721509848826585092011-11-07T19:15:00.004-05:002011-11-07T20:43:58.977-05:00Hierarchically well-separated trees, bottom-upRecently I taught the FRT algorithm to approximate any metric by a hierarchically well separated tree metric (HST) with edge lengths that are powers of 2. The construction is top-down. Initially all vertices are together in one big cluster, and later the clusters are progressively subdivided using smaller and smaller balls. A question came up: is it possible to construct the tree metric bottom-up?<br /><br />Here is an easy way to simulate FRT going roughly bottom-up, described in the papers "Network Warehouses: Efficient Information Distribution to Mobile Users" (Motskin, Downes, Kusy, Gnawali, Guibas) and "Distributed Resource Management and Matching in Sensor Networks" (Gao, Guibas, Milosavljevic, Zhou). <br /><br />Pick a random priority ordering over the vertices, and a random scalar r in [1,2]. For each vertex v in parallel and for each distance d which is a power of 2 times r, find the highest priority vertex u such that B(u,d) contains v. As the power of 2 increases, this defines a sequence of balls containing v, of larger and larger radius, and with centers that have higher and higher priority.<br /><br />To compute the distance between v and w in the tree metric, find the maximum d such that, in their respective sequences, v and w are in the same ball of radius d but in different balls of radius d/2. That determines the place in the tree where v and w have their lowest common ancestor x, corresponding to the ball of radius d that contains both v and w. Then the x-to-v and the x-to-w paths in the tree both have edges of lengths d/2, d/4, d/8, ..., so the distance from v to w in the tree metric is 2d. <br /><br />Here is a question: is it possible to construct a hierarchical clustering that has the properties of HSTs, but that is truly constructed in a bottom-up manner? More precisely, is it possible to commit to a partition of the points into a hierarchical clustering in clusters of radius < d, (that is, commit to the bottom levels of the HST), without looking at the distances that are greater than, say, 16d? <br /><br />Here is a possible answer: put v and w in the same cluster of radius d iff their sequences of balls have the same balls for radius d, 2d, 4d, 8d and 16d. From that point on, in later, coarser clustering, v and w are committed to staying together. What happens if the FRT algorithm wanted to separate them? Perhaps there is a high priority vertex x such that B(x,32d) contains v but not w. Well - tough. If v is the highest priority vertex in the cluster, then w yields to v and puts B(x,32d) in its sequence, even though technically it wasn't supposed to do it. <br /><br />How does that affect the HST properties? It still creates a hierarchically well-separated tree, and the distance in the tree metric is still not much less than the distance in the input metric. But is it still upper bounded by O(log n) times the distance in the input metric, in expectation?Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com0tag:blogger.com,1999:blog-4068183698747623113.post-50316857036645308752011-10-19T08:11:00.003-04:002011-10-20T17:12:24.798-04:00Testing SortednessSofya Raskhodnikova, in a talk, presented a problem that could be taught at the end of the semester of my undergraduate Algorithms course. It can be solved by elementary means using randomization. (It can also be tied to the problem of finding the longest increasing subsequence of an input sequence, a classic illustration of algorithmic techniques.) This builds on two papers, "Improved testing algorithms for monotonicity" by Dodis-Goldreich-Lehman-Raskhodnikov-Ron-Samorodnitsky, and "Transitive-Closure Spanners" by Bhattacharyya-Grigorescu-Jung-Raskhodnikova-Woodruff.<br /><br />Given an array A of n items (numbers), you wish to check, quickly, whether the items are sorted or nearly sorted. By "check", I mean, with reasonable confidence (say, you're correct with probability 99%). By "quickly", I mean in logarithmic time. By "nearly sorted", I mean that only a few of the numbers are out of place: if we ignore 4% of the numbers (outliers), the rest is sorted. <br /><br />Why would you wish to check that? Maybe, if you have some data that used to be sorted but that, after some lazy updates, is now slightly out of order, to test whether you should bother sorting your data again; or, if you're about to resort, to decide which sorting algorithm to use. <br /><br />Here is the algorithm. <br /><br />- Define a set S of n log n "edges" (pairs of items). Assuming A is in sorted order, those are the pairs that would have been compared if you had sorted using Quicksort and if your pivot was always the median of the subset of numbers being sorted in the current recursive call. Thus there are edges between A[n/2] and everybody else, edges between A[n/4] and everyone in the range [1...n/2], edges between A[3n/4] and everyone in the range (n/2..n], etc. <br />- Repeat log n times: pick an edge {i,j} from S uniformly at random, and check whether A[i] and A[j] are in the correct order relative to each other.<br />- Output "The array is nearly sorted" if and only if all the checks are successful.<br /><br />Here is the analysis.<br /><br />- If A is sorted then obviously all checks are successful and the output is correct<br />- If A is not sorted but is nearly sorted, with fewer than 4% of the items out of order, then the output could go either way. <br />- If A is far from being sorted, the central claim is that at least 0.01 n edges of S would fail the check. Then one of those edges would (probably) be found after a logarithmic number of tries.<br /><br />Why does the claim hold? <br /><br />Create a graph with vertex set A and with an edge between every pair {i,j} such that {A[i],A[j]} are in the wrong order. Note that the outliers are a minimum cardinality vertex cover of that graph. It is known (basic approximation algorithms fact) that the maximum matching M has size at least 1/2 times the minimum vertex cover, |M| >= |S|/2. <br /><br />Mark the two endpoints of every edge of S that would fail the check. The number of edges of S that would fail the check is at least half of the number of marked vertices. For each {i,j} in the matching M defined above, by definition of Quicksort there is a k between i and j such that {i,k} and {k,j} are both in S. The central observation is that, since {A[i],A[j]} are in the wrong order, it must be that {A[i],A[k]} or {A[k],A[j]} are in the wrong order, so either i or j or both are marked. So the number of marked vertices is greater than or equal to |M|. <br /><br />Putting those observations together, the number of edges of S that would fail the check is at least half of the number of marked vertices, so it is at least half the size of M, so it is at least a quarter of the number of outliers. So, if there are at least .04 n outliers, then there are at least .01 n edges of S that would fail their check. This proves the claim.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com10tag:blogger.com,1999:blog-4068183698747623113.post-2246363547592222592011-10-09T07:38:00.001-04:002011-10-09T17:12:08.480-04:00Algorithms: a success story (Part 3/2)To recap:<br /><br />1- an applied problem was modeled as a standard combinatorial optimization graph problem (prize-collecting Steiner tree).<br />2- an algorithm was designed for it, and the authors write that this was critical to the project's success. It is based on a general-purpose algorithmic design paradigm that is just now starting to receive a little bit of attention in TCS. <br />3 - the algorithm was implemented.<br />4- the program was used for the input arising from the applied problem. That input was neither random nor grid-like nor anything so well structured (in which cases some ad hoc heuristic might have been better suited than general-purpose algorithmic techniques), but seemed like a pretty arbitrary graph. <br /><br />Is this about algorithms? Yes. <br />Did this work have impact? Yes. <br />Is this good work? I believe so. <br />Would it have found its place at SODA? No. <br /><br />I never see any paper of that type at SODA. It would not fit in, because it does not have a theorem. When I go to Operations Research conferences, I see some presentations of the type: "Here is a problem that came up in my company (for example, a scheduling problem). Here is how I modeled it, and the heuristic I used to solve it. Here are my experimental results after implementation. We beat the previous results by x%." But I never see that kind of presentations at SODA. Isn't that regrettable? Even if such presentations are sometimes not so exciting theoretically, wouldn't they serve an important role in keeping us grounded, connected to the real world, and in helping guide our taste? <br /><br />The ALENEX workshop might serve that purpose, but in reality I don't think it does. As far as I can tell, it's mostly about implementing algorithms - taking the proof-of-concept that a high-level algorithm really is, and translating it into a real program, dealing along the way with a host of issues that come up. A necessary step for sure, to prevent sophisticated algorithms from becoming disconnected from programs, but it's only about linking steps 2 and 3 in the above list, not about the context given by steps 1 and 4 in the above list.<br /><br />The thing that I find really challenging is step 4. Step 1 is easier. We do, once in a while, read papers from other fields and try to make up a theoretical model to capture what seems to us to be the interesting features. But then we go off and merrily design and analyze algorithms for that model, without ever going to step 4 (actually, we usually don't even get to step 3). All it gives us is a good motivational story for the introduction, and a way to feel good, reassuring ourselves that what we do potentially matters and is hypothetically relevant to something. But that hypothesis is never put to the test. <br /><br />If that was just my research <i>modus operandi</i> for me personally, it would not be a big deal: I concentrate on the section of the work where my skills lie, I do my part there and others will do their part as well, and altogether the field can advance smoothly. But when it is the entire field that seems to share my lack of concern for demonstrated ultimate practical impact, then I think that we might have a problem.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com17tag:blogger.com,1999:blog-4068183698747623113.post-6436426569197675702011-10-07T08:23:00.004-04:002011-10-09T17:12:21.234-04:00Algorithms: a success story (Part 2/2)The paper is called: "Finding undetected protein associations in cell signaling by belief propagation", the method is belief propagation, and the authors are Bailly-Bechet, Borgs, Braunstein, Chayes, Dagkessamanskaia, Francois and Zecchina. On benchmarks, their algorithm, compared to some CPLEX-based algorithm, finds Steiner tree solutions that are marginally better (by about 1%), but more importantly, the runtime is also better, by almost two orders of magnitude. So, how do they "solve" prize-collecting Steiner tree with belief propagation? Here is the idea, as far as I understand it.<br /><br /><b>Step one:</b> model the problem locally. Consider a Steiner tree. Root it at some vertex r and label each vertex i of the tree with its distance d(i) to r on the tree and its parent p(i) in the tree; label each vertex i not in the tree with some convention, for example, d(i) arbitrary and p(i)=0. A configuration consists of a value (d(i),p(i)) for each vertex of the graph, such that d(r)=0 and such that neighboring vertices have consistent values (in particular d(p(i))=d(i)-1 for i in the tree and not equal to r). We now have a banal constraint programming problem.<br /><br /><b>Step two:</b> borrow intuition from statistical physics. Take a parameter b and define a distribution on Steiner trees, such that each tree T has probability proportional to exp(-b cost(T)). When b goes to infinity, the distribution is concentrated on the optimal Steiner tree. The crucial step: we now write the mysterious-looking, mysterious-sounding "cavity equations". (Those have nothing to do with the Brown CAVE project.)<br /><br />Consider two adjacent vertices j and i. Let P(d(j),p(j), i) be defined as the probability that j is in the state (d(j),p(j)) in a certain graph: in the case where p(j) is different from i, it is the graph where every edge adjacent to i is removed; in the case where p(j)=i, it is the graph where every edge adjacent to i is removed, except {i,j}. Then the tree has to pay the cost of the edge from j to p(j), and, given the state of j, the states of the other neighbors k of j are essentially independent from one another, or at least, we hallucinate that that is the case, as if vertex {j} was a cut:<br /><br />P(d(j),p(j),i)= exp(-b cost({j,p(j)}))*prod[ Q(k,d(j),p(j))],<br /><br />where the product is over every neighbor k of j other than i, and Q is the probability that, in a certain graph, k is in a state consistent with (d(j),p(j)): <br /><br />Q(k,d(j),p(j))=sum P(d(k),p(k),i),<br /><br />where the sum is over (d(k),p(k)) compatible with (d(j),p(j)).<br /><br />When the graph is a tree, if we write those equations going bottom up in the tree, we see that they are equivalent to a simple dynamic program to compute the distribution, so these equations exactly capture the distribution. When the graph has only one cycle, this approach has also been shown to be correct. When the graph is sparse with high girth and the influence of values on the objective decays quickly, one intuitively expects this to work reasonably well, but theoretically it's a big mess, both in terms of convergence and in terms of guarantees on the limit solution.<br /><br /><b>Step three:</b> To solve the cavity equations, use an iterative method from some perhaps random starting point, until a fixed point is reached. <br />P(d(j),p(j),i)[t+1] = exp(-b cost({j,p(j)}))*prod[ Q(k,d(j),p(j))[t]]<br />Q(k,d(j),p(j))[t+1] = sum P(d(k),p(k),i)[t],<br />That fixed point can then be reinterpreted as a particular Steiner tree. <br /><br /><b>Step four:</b> to make the method effective, use various tricks, clever ideas and hacks to speed it up and make it more convergent: take logs, simplify the set of variables, get rid of the "guess" of r, add some small random perturbation to eliminate possible cycles.<br /><br />That's it!Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com4tag:blogger.com,1999:blog-4068183698747623113.post-760026962228634772011-10-05T08:31:00.004-04:002011-10-09T17:12:33.484-04:00Algorithms: a success story (Part 1 /2)Protein interactions follow signaling pathways whose cascading effects cause some diseases. Past experiments have harvested much information, not always very reliable, on which pairs of proteins interact with each other. From that information, how can one infer entire subgraphs that, together, determine the function?<br /><br />Define a graph where the vertices are proteins and the edges are interactions between proteins.<br />Give a cost to each edge, depending on how confident you are that the experimental data shows interaction -- cost zero if you know with certainty that there is interaction. Give a weight to each vertex, depending on the differential expression of that protein -- high weight if the protein is significant, weight zero if you are not interested in that protein.<br /><br />Then the problem can be modeled as a prize-collecting Steiner tree problem. The objective is to find a min cost tree connecting the vertices of positive weight, where vertices can be omitted by paying a penalty equal to their weight:<br /> Min (cost (tree) + weight(vertices not in tree)).<br />Since there is no natural absolute scale common to edge costs and to vertex weights, it makes sense to use a scaling parameter lambda and find:<br /> Min (cost (tree) + lambda*weight(vertices not in tree)).<br />Then the human user can look at the outputs produced for a range of values of lambda (as lambda grows, fewer and fewer vertices are connected and the tree becomes smaller and smaller) and eyeball the most interesting one. <br /><br />In a paper I looked at, the authors do just that. They ran a prize-collecting Steiner tree algorithm on a yeast protein network. They were able to find pathways that had already previously been found by other means. The Steiner tree, predictably, contains vertices of high weight ("proteins differentially expressed"), but also Steiner vertices, possibly of low weight but serving to bridge subparts of the tree. One of the Steiner vertices they found, COS8, was "a gene of unknown function" that people had not previously paid much attention to. <br /><br />They then proceeded to run biological experiments in the lab to test COS8, and, lo and behold! They found that keeping or deleting COS8 from the strains had a critical effect.<br /><br />What gets much of the credit for this success story? Algorithmic design. The reason why the authors were able to run their program on that data in such an effective manner was that their algorithm was faster and better than previous methods. In fact, this is an ideal setting for algorithmics: one does not need an exact answer, since a good enough approximation algorithm suffices; and the problem is a well-studied type of variant of a classic combinatorial optimization problem.<br /><br />The questions that the interested reader will ask, I am sure, is: what was the algorithmic technique used to get this breakthrough? And who are the researchers, in the approximation algorithms community, who did this masterful piece of algorithmic design?<br /><br />The answer will wait for another post.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com2tag:blogger.com,1999:blog-4068183698747623113.post-1058977422680173292011-09-17T14:32:00.013-04:002011-09-21T18:06:57.218-04:00SODA 2012 papers on ArxivHere are the Soda papers that have an Arxiv version - not quite one third of them. I probably missed a few (searched for the exact title, and got a few syntax errors in my search); will complete the list if people send me corrections. This is part of my desire to make conferences less important in our field. One of their purposes used to be fast dissemination of new research, but now that purpose may be served better, faster, and more cheaply by Arxiv.<br /><br /><br /><br /><a href="http://arxiv.org/abs/1012.1886">Sublinear Time, Measurement-Optimal, Sparse Recovery For All </a><br /><br />Ely Porat and Martin Strauss <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.4466">Counting Perfect Matchings as Fast as Ryser </a><br /><br />Andreas Björklund <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.1628">A Proof of the Boyd-Carr Conjecture </a><br /><br />Frans Schalekamp, David Williamson and Anke Van Zuylen <br /><br /><br /><br /><a href="http://arxiv.org/abs/1109.0604"><br />Approximate Counting via Correlation Decay in Spin Systems </a><br /><br />Liang Li, Pinyan Lu and Yitong Yin <br /><br /><br /><br /><a href="http://arxiv.org/abs/1106.0423">Physarum Can Compute Shortest Paths </a><br /><br />Vincenzo Bonifaci, Kurt Mehlhorn and Girish Varma <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.4537">Metastability of Logit Dynamics for Coordination Games </a><br /><br />Vincenzo Auletta, Diodato Ferraioli, Francesco Pasquale and Giuseppe Persiano <br /><br /><br /><br /><a href="http://arxiv.org/abs/1010.4280"><br />The Notion of a Rational Convex Program, and an Algorithm for the Arrow-Debreu Nash <br />Bargaining Game </a><br /><br />Vijay Vazirani <br /><br /><br /><br /><a href="http://arxiv.org/abs/1106.0518"><br />Submodular Functions are Noise Stable </a><br /><br />Homin Lee, Pravesh Kothari, Adam Klivans and Mahdi Cheraghchi <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.2422">A Linear Time Algorithm for Seeds Computation </a><br /><br />Tomasz Kociumaka, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter and Tomasz Walen <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.2341">The condensation transition in random hypergraph 2-coloring </a><br /><br />Amin Coja-Oghlan and Lenka Zdeborova <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.1720">Jaywalking your Dog - Computing the Fréchet Distance with Shortcuts </a><br /><br />Anne Driemel and Sariel Har-Peled <br /><br /><br /><br /><a href="http://arxiv.org/abs/1104.2230">Subexponential Parameterized Algorithm for Minimum Fill-in </a><br /><br />Fedor V. Fomin and Yngve Villanger <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.5377">The Set of Solutions of Random XORSAT Formulae </a><br /><br />Yashodhan Kanoria, Andrea Montanari, Morteza Ibrahimi and Matt Kraning <br /><br /><br /><br /><a href="http://arxiv.org/abs/1012.1577"><br />Sparser Johnson-Lindenstrauss Transforms </a><br /><br />Daniel Kane and Jelani Nelson <br /><br /><br /><br /><a href="http://arxiv.org/abs/1105.3368">Random Walks, Electric Networks and The Transience Class problem of Sandpiles </a><br /><br />Ayush Choure and Sundar Vishwanathan <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.3068">Compression via Matroids: A Randomized Polynomial Kernel for Odd Cycle Transversal </a><br /><br />Stefan Kratsch and Magnus Wahlström <br /><br /><br /><br /><a href="http://arxiv.org/abs/1104.2275">Approximate Tree Decompositions of Planar Graphs in Linear Time </a><br /><br />Frank Kammer and Torsten Tholey <br /><br /><br /><br /><a href="http://arxiv.org/abs/1105.6257">Computing all maps into a sphere </a><br /><br />Martin Čadek, Marek Krčál, Jiří Matoušek, Francis Sergeraert, Lukáš Vokřínek and Uli Wagner <br /><br /><br /><br /><a href="http://arxiv.org/abs/1104.5007">Tight bounds on the maximum size of a set of permutations with bounded VC-dimension </a><br /><br />Josef Cibulka and Jan Kyncl <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.2231">Partial match queries in random quadtrees </a><br /><br />Nicolas Broutin, Ralph Neininger and Henning Sulzbach <br /><br /><br /><br /><a href="http://arxiv.org/abs/1108.4744">Mechanism Designs via Consensus Estimate and Cross-Check </a><br /><br />Bach Ha and Jason Hartline <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.5102">Packing anchored rectangles </a><br /><br />Adrian Dumitrescu and Csaba Toth <br /><br /><br /><br /><a href="http://arxiv.org/abs/1104.3808">Directed Nowhere Dense Classes of Graphs </a><br /><br />Stephan Kreutzer and Siamak Tazari <br /><br /><br /><br /><a href="http://arxiv.org/abs/1104.4597">The Entropy Rounding Method in Approximation Algorithms </a><br /><br />Thomas Rothvoss <br /><br /><br /><br /><a href="http://arxiv.org/abs/1104.3045">Confluent Persistence Revisited </a><br /><br />Sebastien Collette, John Iacono and Stefan Langerman <br /><br /><br /><br /><a href="http://arxiv.org/abs/1104.2799">Using Hashing to Solve the Dictionary Problem \\(In External Memory) </a><br /><br />John Iacono and Mihai Patrascu <br /><br /><br /><br /><a href="http://arxiv.org/abs/1108.0809">Towards Robust and Efficient Computation in Dynamic Peer-to-Peer Networks </a><br /><br />John Augustine, Gopal Pandurangan, Peter Robinson and Eli Upfal <br /><br /><br /><br /><a href="http://arxiv.org/abs/1008.1555">The complexity of conservative valued CSPs </a><br /><br />Vladimir Kolmogorov and Stanislav Zivny <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.1958">Linear Index Coding via Semidefinite Programming </a><br /><br />Eden Chlamtac and Ishay Haviv <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.0871">A simple algorithm for random colouring G(n, d/n) using (2+\epsilon)d colours </a><br /><br />Charilaos Efthymiou <br /><br /><br /><br /><a href="http://arxiv.org/abs/1104.1732">Optimal Column-Based Low-Rank Matrix Reconstruction </a><br /><br />Venkatesan Guruswami and Ali Sinop <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.2444">Private Data Release Via Learning Thresholds </a><br /><br />Moritz Hardt, Guy Rothblum and Rocco Servedio <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.2312">Computing distance between piecewise linear bivariate functions </a><br /><br />Boris Aronov and Guillaume Moroz <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.5329">Matroidal Degree-Bounded Minimum Spanning Trees </a><br /><br />Rico Zenklusen <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.5478">Deterministic Construction of l-type Ellipsoids and its Application to Derandomizing Lattice </a><br /><br />Algorithms <br /><br />Daniel Dadush and Santosh Vempala <br /><br /><br /><br /><a href="http://arxiv.org/abs/1105.4125">Privacy-Preserving Group Data Access via Stateless Oblivious RAM Simulation </a><br /><br />Michael Goodrich, Michael Mitzenmacher, Olga Ohrimenko and Roberto Tamassia <br /><br /><br /><br /><a href=""http://arxiv.org/abs/1108.2290">Dimension reduction for finite trees in l_1 </a><br /><br />James Lee, Arnaud De Mesmay and Mohammad Moharrami <br /><br /><br /><br /><a href="http://arxiv.org/abs/1108.4647">Expanders are Universal for the Class of all Spanning Trees </a><br /><br />Daniel Johannsen, Michael Krivelevich and Wojciech Samotij <br /><br /><br /><br /><a href="http://arxiv.org/abs/1101.3110">Stochastic coalescence in logarithmic time </a><br /><br />Po-Shen Loh and Eyal Lubetzky <br /><br /><br /><br /><a href="http://arxiv.org/abs/1107.2221">Bidimensionality and Geometric Graphs </a><br /><br />Fedor V. Fomin, Daniel Lokshtanov and Saket Saurabh.<br /><br /><br /><a href="http://arxiv.org/abs/1105.1325">Testing Odd-Cycle-Freeness in Boolean Functions </a><br /><br />Arnab Bhattacharyya, Elena Grigorescu, Prasad Raghavendra, and Asaf Shapira.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com1tag:blogger.com,1999:blog-4068183698747623113.post-85685318022157531622011-09-14T07:23:00.004-04:002011-09-14T12:14:36.352-04:00Dan Spielman on research<i>Claire:</i> Why are your papers so long? They're sophisticated and ground<br />breaking, but I don't have the energy or time to read such long,<br />complex papers.<br /><i>Dan:</i> I don't want them to be that way. It's accidental that they are so<br />long. I just don't know when to give up. I keep going further and further<br />into the problem until some solution comes, and sometimes it takes a<br />hundred pages and years of work.<br /><br /><i>Claire:</i> But if many people are like me, then, having papers that are much<br />longer than the norm, do you really think that that's a good marketing<br />strategy?<br /><i>Dan:</i> Probably not. But enough people read them to satisfy me. Anyway, the<br />way to go if you want to learn about the work I have done is not to read my<br />papers. I find that the lecture notes format is much more<br />satisfying. When I write a paper, I feel that somehow I have to state and<br />prove the strongest result I have. When I teach, I give a weaker version<br />that has a much simpler, shorter, more understandable proof, and that's<br />what I put in my lecture notes, so that's what you should read. Also, later<br />versions are usually quite a bit simpler.<br /><br /><i>Claire:</i> How do you choose research problems?<br /><i>Dan:</i> I find that I am attracted to problems that are hard. It's difficult<br />for me to get motivated unless I have a sense that the problem is<br />important; that's what keeps me going.<br /><br /><i>Claire:</i> But among all hard and important open problems, how do you know<br />which one is a "good" problem?<br /><i>Dan:</i> It has to be a problem that gives me a lot of ideas, where I see a lot<br />of possible connections with other areas, other subjects, where there are<br />many angles that I want to explore, many papers to read and new things to<br />learn. Then, along the way, I find that I am proving results on the side<br />that are of interest in themselves. That's how I get a strong sense that<br />it's a good problem to be working on.<br /><br /><i>Claire:</i> That seems to require a lot of experience. Is that how you picked<br />problems when you were a graduate student?<br /><i>Dan:</i> No, you're right, of course that's not how I worked when I was a<br />student. Then, I did the usual thing, you know, look around for hot topics<br />on which there had recently been partial progress and new results had come<br />out that were begging for improvements, and then I was able to find those<br />improvements before professors just because I had a lot more time to devote<br />to research than those professors.<br /><br /><i>Claire:</i> Do you think that you are a better researcher now than then?<br /><i>Dan:</i> yes, I think I am better at research. Although, I must admit that I<br />have a lot less time for research, and, in particular, I no longer have<br />time to keep up and read an insane number of papers, as I did when I was a<br />student.<br /><br /><i>Claire:</i> At this point you're about mid-career. What do you want to do for<br />the next half of your career?<br /><i>Dan:</i> That's a scary thought. I'm not really thinking about it. Right now I<br />have projects that will keep me quite busy for the next few years, and by<br />then, presumably I will have learned new things that will drive my<br />research for the following few years, and so on. I do predict that I will<br />be doing more and more programming.<br /><br /><i>Claire:</i> Do you see yourself as a mathematician or as a computer scientist?<br /><i>Dan:</i> I must say that there are many things mathematicians do that I do not<br />get, in the sense that I have no clue <i>why</i> they work on those<br />particular questions. I can read a Math paper and like its content very<br />much, yet not have any idea why the author worked on that question. Maybe<br />because someone else asked it before? It's a mystery to me. As for myself,<br />I come to problems very much from the perspective of a computer science<br />motivation.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com4tag:blogger.com,1999:blog-4068183698747623113.post-14679092677410308982011-08-10T11:40:00.003-04:002011-08-10T11:40:00.973-04:00Another good open problem?A criticism often heard coming from scientists in other disciplines is that we put too much emphasis on worst-case analysis. But where else would we go? I have an interest in reconstruction problems, when the input is a noisy version of some hidden ground truth.
<br />
<br />Recently I was looking at a paper by Magen and Moharrami about independent set in near-planar graphs, that is, planar graphs with o(n) extraneous edges.
<br />
<br />Now, how about reconstructing a planar graph?
<br />
<br />The problem would take as input a graph and remove the fewest edges to make it planar (minimum non-planar deletion). Think about Kleinberg's original small world networks: they consisted of a planar graph plus a moderate number of random "long" edges. So, such graphs have been considered. The problem is APX-hard. The complementary problem has a constant factor approximation (Calinescu, Fernandes, Finkler and Karloff). Here's the question. Consider a graph built as follows: first, pick an arbitrary planar graph -- the ground truth. Then, take a random graph G(n,p) where p=o(1)/n -- the noise. The superposition of those two graphs is the input. Given the input, is it possible to reconstruct the ground truth or at least a close approximation to it?
<br />
<br />Why this may be a good open question:
<br />- The broad area of reconstruction is exciting
<br />- It is not inconceivable that a good algorithm might interest people outside Theory.
<br />- The problem has not been studied that much, so there are not yet walls of impossibly hard questions in every direction.
<br />- The probabilistic question feels doable. It has never been considered before.
<br />- Little background is needed
<br />- It is a chance to learn about: elementary random graph techniques and perhaps basic FPT techniquesClaire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com4tag:blogger.com,1999:blog-4068183698747623113.post-31423391332406807112011-08-08T11:17:00.001-04:002011-08-08T11:17:01.112-04:00L'union fait la forceThat's the moral of a fable by La Fontaine, inspired by Aesop: <br /><i>An old man on the point of death summoned his sons around him to give them some parting advice. He ordered his servants to bring in a faggot of sticks, and said to his eldest son: "Break it." The son strained and strained, but with all his efforts was unable to break the Bundle. The other sons also tried, but none of them was successful. "Untie the faggots," said the father, "and each of you take a stick." When they had done so, he called out to them: "Now, break," and each stick was easily broken. "You see my meaning," said their father. Union gives strength.</i><br /><br />Oded Regev recently gave me a mathematical illustration of that principle .<br /><br />Given n independent random variables x(1), x(2), ..., x(n), the mutual information between the collection of all x(i)'s taken together and some other variable m is greater than the sum of the mutual informations between each variable x(i) and m: <br /><br />I(x(1)x(2)...x(n):m) ≥ Σ I(x(i):m)<br /><br />For example, imagine x(1) and x(2) are independent random bits, and m=x(1) XOR x(2). Then x(1) alone gives no information about m, so I(x(1):m)=0. Similarly, x(2) alone gives no information about m, so I(x(2):m)=0. However, x(1) and x(2) together determine m, so they give full information about it: I(x(1)x(2):m)=1. It is not hard to check that 1 is indeed greater than 0+0.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com0tag:blogger.com,1999:blog-4068183698747623113.post-35588351789852931932011-08-06T11:32:00.001-04:002011-08-06T11:32:00.469-04:00A good open problem?What makes a open problem a <i>good</i> problem to work on? What are the criteria? Let me take an example and try to argue for it. I think that the 2010 paper by Chlamtac, Krauthgamer and Raghavendra is a good research direction. That paper is only a first step, with much room left for improvements. Here are some questions raised by that paper.<br /><br />- <b>(Improve)</b> Is there hope that by vastly increasing the number of levels of lifting in the LP, the approximation ratio is reduced to a constant independent of the treewidth, so that the treewidth appears in the runtime only?<br />- <b>(Simplify)</b> Now that there is one proof, extracting its key ideas, is it possible to design a direct, possibly more efficient argument that would not rely on lift-and-project? <br />- Is there a version that would work in the framework of branch decomposition instead of treewidth? <br />- <b>(Generalize)</b>The authors present a table of related results and techniques. The diversity of algorithmic techniques is surprising. Is there a way to unify them and provide a single general-purpose algorithm based on Sherali-Adams lift-and-project, that would simultaneously solve many of cases listed in the table? As a warmup question, how does the algorithm perform in the (previously solved) case of uniform demands?<br />- <b>(Same inputs, other problems)</b> Can a similar approach be tried for other non-local non-trivial problems on graphs of bounded treewidth, for example Steiner forest (recently Ptas'ed in bounded treewidth graphs by Bateni Hajiaghayi and Marx) or bandwidth problems?<br />- <b>(Same problem, other inputs)</b> Many planar graph algorithms work by a reduction to graphs of bounded tree width. Can the result be extended to planar graphs?<br /><br />Here is why I think it's a good research topic. First, network cuts and connectivity problems are the bread and butter of approximation algorithms. That's where much of the action is, where a lot of the significant progress happens, and seems to be an inspiration for exciting research. Second, related to the first, it's an area where many people are interested in whatever new result comes up, and where it's relatively easier than other subfields to get papers accepted. Third, this particular problem has not been looked at by that many people, so it is possible that improving on the Chlamtac et al result is within reach. Fourth, there's a range of questions that come to mind when looking at that paper, some hard, some easier, so it ought to be possible to work on this in a mixed breadth-first-search/depth-first-search manner so as to avoid being fruitlessly stuck with a smooth wall blocking all progress and all ideas: go in depth in the most promising direction, until one gets stuck, then branch off to another less hopeless direction, etc. Fifth, for people doing algorithms, it requires only a modest investment: read in depth a mere couple of background papers in addition to the Chlamtac et al paper. Sixth, if worst comes to worst and you do not get any new result out of your endeavour, at least you will have learnt about lift-and-project and about treewidth. Learning new tools and techniques is a net plus even if no research result comes out of that failed project. (http://terrytao.wordpress.com/career-advice/learn-the-power-of-other-mathematicians-tools/).Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com0tag:blogger.com,1999:blog-4068183698747623113.post-22365043241230824952011-08-03T16:17:00.004-04:002011-08-03T16:31:16.818-04:00Algorithms using lift-and-projectI was happy that my post on lift-and-project seemed to arouse interest. Since then, I browsed through a few papers to see how teaching lift-and-project might continue from there. I now have a plan:<br /><br /><b>Lecture 1</b>: Introduction to lift-and-project, see previous post<br /><br /><b>Lecture 2</b>: (Bienstock and Ozbay 2004) Consider independent set. If the underlying input graph has tree width bounded by a constant, then the Sherali-Adams lifted linear program is <i>exact</i> after only a constant number of lifts, meaning that its integrality gap equals 1 and that every solution of the lifted linear program is a convex combination of integer solutions. <br /><br />Remember that the lifted variables can be interpreted as defining a distribution of integer solutions over the local set of vertices under consideration.<br /><br />Then rounding a solution is easy: first <i>pick</i> the values v1,v2,...,vc (zero or one) of c variables in a first bag (x1,x2,...,xc) so that the "probability" y(x1=v1,x2=v2,...,xc=vc) is positive. Then repeatedy extend the values of a bag to a neighboring bag: if the neighboring bag has some variables (x1,x2,...,xd) in common with the old bag and some new variables (x'(d+1),...,x'c), <i>pick</i> the values for the new variables in a consistent manner, that is, so that <br />y(x1=v1,x2=v2,..xc=vc,x'(d+1)=v'(d+1),...,x'c=v'c) is positive.<br /><br /><b>Lecture 3</b>: (Chlamtac Krauthgamer and Raghavendra 2010) Consider general demand sparsest cut. If the underlying input graph has tree width bounded by a constant r, then the Sherali-Adams lifted linear program has integrality gap O(1) after only a constant number of lifts. (The O(1) depends on r).<br /><br />Then rounding a solution is easy: first <i>sample</i> the values v1,v2,...,vc (zero or one) of c variables in a first bag (x1,x2,...,xc) according to the distribution defined by the lifted variables y(x1=v1,x2=v2,...,xc=vc). Then repeatedy extend the values of a bag to a neighboring bag: if the neighboring bag has some variables (x1,x2,...,xd) in common with the old bag and some new variables (x'(d+1),...,x'c), <i>sample</i> the values for the new variables in a consistent manner, that is, according to the conditional distribution of the lifted LP variables for (x1,x2,...,xd,x'(d+1),...,x'c) given the values already chosen for variables (x1,x2,...,xd). <br /><br />I believe that this sequence is a relatively simple presentation of lift-and-project for algorithms: the part about independent set is exact and, if reduced to its core and looked at from the right perspective, it can be made straightforward (maybe that will be material for another post). The part about sparsest cut is a new algorithmic result, the algorithm is a natural extension and the analysis, seen from a distance, feels right although it is a little bit technical - it uses a Markov flow graph, and going through that proof in class might be painful, but it's the best compromise I have found between the need for simplicity and elegance, and the desire to bring the class right at the threshold of open problems.Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com2tag:blogger.com,1999:blog-4068183698747623113.post-41461033684958218852011-07-20T11:17:00.002-04:002011-07-28T12:31:42.267-04:00Lift-and-project and odd cycle constraintsLift-and-project has been studied during the last 10 years in the theoretical computer science community as a systematic way to strengthen linear programs. As an introduction, let's consider the independent set problem.<br /><br />Start from the usual LP relaxation for independent set: one variable x(i) for each vertex i, such that<br />0 ≤x(i) ≤1 for each vertex i<br />x(i)+x(j)≤ 1 for each edge {i,j}.<br /><br />Algebraically, one round of lifting leads to the following LP in the Sherali-Adams variant: each initial constraint leads to 2n new constraints. For each k, one new constraint is obtained by multiplying by x(k) and the other by multiplying by 1-x(k), and then substitute y(i,k) for x(i)x(k). We have variables x(i) for each vertex and y(i,j)=y(j,i) for each pair of vertices, with y(i,i)=x(i). The constraints obtained by this automatic process are thus:<br />(1) 0≤ y(i,k) ≤x(k) for each pair of vertices i and k<br />(2) 0≤x(i)-y(i,k)≤ 1-x(k) for each pair of vertices i and k<br />(3) y(i,k)+y(j,k) ≤x(k) for each edge {i,j} and each vertex k<br />(4) x(i)-y(i,k)+x(j)-y(j,k) ≤ 1-x(k) for each edge {i,j} and each vertex k.<br />That's the automatic algebraic definition. <br /><br />To understand this intuitively, note that each variable x(i) of the basic LP can be interpreted as a distribution over vertex i: x(i)=.3 means that with probability .3 vertex i is in the independent set and with probability .7 it's not in the independent set. Similarly, each {x(i),x(j),y(i,j)} of the lifted LP can be interpreted as a distribution over the pair of vertices i,j: x(i)=.35,x(j)=.6,y(i,j)=.2 means that with probability .2, both i and j are in the set, with probability .35-.2=.15, only i is in the set, with probability .6-.2=.4, only j is in the set, and with the remaining probability .25, neither i not j are in the set.<br /><br />Let us focus on the lifted LP for independent set. First, notice that y(i,j)=0 for every edge {i,j}: to prove this algebraically, apply (3) to k=i, substitute y(i,i)=x(i) and simplify to get y(i,j)≤ 0. Together with (1) that implies y(i,j)=0. To understand this intuitively: the constraint x(i)+x(j)≤ 1 means that i and j cannot simultaneously be in the set, so in a joint distribution over i and j, the probability that both i and j are in the set should be zero.<br /><br />Now, here's a more interesting fact about the lifted LP for independent set. I want to prove that constraints (1)-(4) imply every <i>odd cycle constraint</i>: for every cycle C of odd length, the sum of the x(i)'s for i in C is at most (length(C)-1)/2. At first sight, that is a bit surprising, because the lifted LP still merely consists of local constraints (with a bunch more variables), whereas the odd cycle constraints are long-range global constraints. I will prove this for a cycle of length 9, consisting of vertices 1,2,...,9.<br /><br />Note that since 1 and 2 are neighbors, y(1,2)=0<br />Apply (4) for k=1 and {i,j}={2,3}: x(2)+x(3)-y(1,2)-y(1,3)≤ 1-x(1)<br />Apply (3) for k=1 and {i,j}={3,4}: y(1,3)+y(1,4)≤ x(1)<br />Apply (4) for k=1 and {i,j}={4,5}: x(4)+x(5)-y(1,4)-y(1,5)≤1-x(1)<br />Apply (3) for k=1 and {i,j}={5,6}: y(1,5)+y(1,6)≤ x(1)<br />Apply (4) for k=1 and {i,j}={6,7}: x(6)+x(7)-y(1,6)-y(1,7)≤1-x(1)<br />Apply (3) for k=1 and {i,j}={7,8}: y(1,7)+y(1,8)≤ x(1)<br />Apply (4) for k=1 and {i,j}={8,9}: x(8)+x(9)-y(1,8)-y(1,9)≤1-x(1)<br />Note that since 9 and 1 are neighbors, y(1,9)=0<br />Sum everything: after cancellations, we get x(1)+x(2)+...+x(9) ≤4. <br />That's the odd cycle constraint.<br /><br />What miracle just happened? Instead of going all the way around the cycle, let's just go through the path from 1 to 9 (ignoring the edge between 9 and 1) and sum those inequalities. We obtain:<br />x(1)+x(2)+...+x(9)≤ 4+y(1,9).<br />Normally, we know that the independent set can only have at most 5 vertices of a path consisting of 9 vertices, and that can easily be proved from the basic LP, but this tells us more. The variable y(1,9) captures the joint distribution of x(1) and x(9): it tells us that the sum can only be more than 4 when x(9) and x(1) are simultaneously equal to 1. This additional information is what comes into play when we close the cycle: because there is an edge between 9 and 1, x(1) and x(9) can never be 1 together, so we're stuck at 4. <br /><br />Stronger versions of lift-and-project add positive semi-definite constraints. Why do we care about lift-and-project? Ten years ago, it seemed like a promising venue to reduce the integrality gap of linear programs. That promise has not yet been fulfilled, and it is not so easy to manipulate those complicated, higher dimensional objects, but we are slowly making progress. For example, in the upcoming FOCS Barak, Raghavendra and Steurer use lift-and-project for an algorithm for unique games. <br /><br />In a graduate course on approximation algorithms, I would want to spend a lecture or two talking about lift-and-project. The above would be a preliminary introduction, but after that, I would want an example of application of lift-and-project that would be simple enough to present in about one lecture. I am wondering: what is the least technical interesting example of using lift-and-project?Claire Mathieuhttp://www.blogger.com/profile/10957755706440077623noreply@blogger.com11