A CS Professor's blog: December 2010

Thursday, December 30, 2010

What people search for

Without the internet, how would I have learnt how to (last week) patch a pair of jeans, or how to (this week) open fresh oysters? With the internet, all knowledge is at my fingertips. I never cease to be amazed.

Here in France, when I start entering the phrase "How to" in Google, the first suggestions that are coming up today are:

- how to make it in America

- how to train your dragon

Normally, if I remember correctly, in the US the first suggestions are:

-how to make love

-how to tie a tie

Curious, I entered the French analog, "Comment", in Google. First suggestions:

-how to make love,

-how does it work,

-how to write a vita,

-how to become rich.

I guess that the French know how to tie a tie.

Sunday, December 26, 2010

Business with unreliable partners

How can we communicate with a business partner at an unequal footing to ensure that they are keeping their side of the contract?

For example, say that you are flying with and airline company that claims that they closed the door on you for safety or other reasons. How can we verify that our seat on the flight has not been given to a standby passenger, and that this is not a lie? When they say that they tried to reschedule on another flight on the same day but failed, how can we verify that that is not a ure lie? When they say thay also looked at departures from other companies but that everything was full, how do we know it is the truth? When they claim that we cannot get our luggage back, or that the seats they are finally giving us are the last ones available on that flight, how do we know it is the truth?

When trust is gone, we need a way to make sure that the contract between company and customer is not breached by the side that has more power and a monopoly on information. How? Should air passengers unionize? Or is there some cryptographic procedure that would help verify the company's claims?

Friday, December 24, 2010

How United Airlines stole our Christmas

My teenage children and myself, living in the US, were going to spend Christmas with our family in France. We got tickets to fly from Boston to Paris via Washington Dulles, leaving on December 23, the last day of school, and arriving the next day.

On December 23, we arrived at the airport four hours early, ready to start our holiday. Our problems started when our airplane left the gate. It was bumped off its spot in the queue for taking off, and spent time off to the side. When its turn came back again, a mechanical problem, caused by a flap that wouldn't close, caused it to return to the gate. After mechanics had closed it manually while we waited anxiously, we finally took off, a little after 4pm (our scheduled departure time was 2:16pm). During the flight, one steward gathered information, told each person their connection gate, suggested we study the map of the airport in the flight magazine, and, upon arrival, pleaded for other passengers to let the ones in a rush go ahead and make a run for it.

"As soon as we become aware of a service interruption, we will relay information to affected customers, and we will work proactively to help them reach their intended destinations."

It was 5:09pm when we landed. Eight passengers were going on to Paris, with a connection at 5:25pm. After taxiiing, the door opened, we rushed out of the airplane and ran madly to our departure gate. Fortunately all of us arrived before 5:25. But to our great dismay, the door was closed! In fact the first two of the eight had arrived early enough to see the door closing in front of them! We stared at one another in disbelief. How could that be? Impossible! We begged the staff to reopen to door, but to no avail. They said that they had received a "direct order from Chicago" to close the door. One passenger, flushed by her run, got out her asthma inhaler and took deep breaths. A child was bravely putting on a good face, silently holding back tears. As our little group was standing angrily in front of the closed door, we were advised to go to customer service. We were told that we were just wasting time by staying there, diminishing our chances for a good alternative flight plan.

At United, the only acceptable customer experience is one in which you arrive at your intended destination safely, comfortably and on time. When events and obstacles beyond our control, such as severe weather or Air Traffic Control delays, prohibit us from achieving this standard, we strive to minimize customer inconvenience however possible.

At customer service, we joined the long line. The representatives were busy rescheduling people on flights for the following day. We would leave on Christmas Eve, on the exact same flight but 24 hours later than originally planned. We would spend the evening and night of Christmas on the flight, and arrive in Paris on the 25th.

The service representative gave us a hotel reservation and a $15 meal voucher for dinner. "What about breakfast and lunch?", I asked. "I'm sorry, but we can only give you one meal voucher", answered the United airlines representative. "What about our bags?", I asked. "I have already rescheduled them and you will get them when you get to Paris." Apparently other people were given the choice to wait for their bags and pick them up, but not us.

"In the air and on the ground, online and on the telephone, our customers have the right to expect — to demand — respect, courtesy, fairness and honesty from the airline they have selected for travel."

I left the customer service counter seething with anger. We waited in the cold for a shuttle, that took us to an upscale golf resort twenty miles away. Meanwhile I decided to calm down and try to make the best of it. "Life is a journey, travel it well", says United in their slogan. At the hotel, we spent a small fortune on ill-fitting swimming suits and relaxed at the pool for a while. After a dinner at the one place at the hotel that was still open (and where the cost of three sandwiches easily exceeded our vouchers), we paid another $15 to watch a movie before calling it a day. Since we had no toiletries with us, I asked the front desk for help, but theirs were only available for purchase. I noted that it violated United's written policy to provide stranded passengers with complimentary toiletries, but to no avail.

This year, for us, there will be no decorating the tree, no putting together the creche with our cousins, no singing carols, no midnight Mass. Instead, there is waiting in line in the hellish atmosphere of a crowded airport.

At Christmas, we celebrate the birth of Christ. In the mayhem of holiday airport travel, when I think of United slamming its door to our face, it calls to mind the innkeeper in Bethlehem who refused to make room for a 9-month old pregnant woman.

And that is United Airlines's way: "Fly the friendly skies of United". Merry Christmas!

Tuesday, December 21, 2010

Examples of well-written papers?

Yesterday I found this comment on this blog: "As a young researcher trying hard to write nice papers, I don't find many role models."

Forget the introduction, and look at the technical content: what are examples of particularly well-written papers?

I personally like the papers (and lecture notes) by Michel Goemans. I find them clean, rigorous, concise and crisp, elegant in a minimalistic way. For example, his 2006 paper on minimum bounded degree spanning tree (http://math.mit.edu/~goemans/PAPERS/bmst-focs06.pdf). But there may be a cultural bias. Maupassant would be my ideal role model. In his short stories, he always uses precisely the right word to convey the correct meaning. He reputedly spent much time trimming his texts and eliminating all unnecessary adjectives. Why not aim to do the same for technical writing? Write exactly what is necessary for the readers' understanding, no less but no more. Each word should have a purpose, each notation a justification, each sentence a reason for being there. For example each mathematical concept, in my ideal world, is defined exactly twice, once with words and once with symbols, because some readers are more comfortable with words and some with Mathematical notation, and because the two presentations are mutually reinforcing, and help remove possible ambiguities. More than twice is a waste of words, in my ideal world.

Monday, December 20, 2010

Writing dynamic programs

I think of dynamic programming as a bottom-up computation, but that view can be slightly misleading. For example, consider keeping track of the connected components of a forest F in a graph of bounded treewidth.

For a bag B, let G(subtree(B)) denote the subgraph induced by the nodes appearing in bag B (let's call those portal nodes) and in all of its descendant bags as well. Let G(B) denote the constant size graph induced by just the portals.

Consider the connected components of (F intersect G(subtree(B))). They induce a partition pi(B) of the portals.
- for a leaf bag, pi(B) = the connected components of (F intersect G(B))
- for a bag B with two children B1 and B2, pi(B) is obtained by "combining" pi(B1), pi(B2), and (F intersect G(B)).
It's a bottom-up recurrence.

Consider instead the connected component of F, the global forest. They also induce a partition sigma(B) of the portals.
- for the root bag, sigma(B)=pi(B)
- for a bag B with parent bag B', sigma(B) is obtained by "combining" sigma(B'), pi(B), and (F intersect G(B union B')).
It's a top-down recurrence.

Since all recurrence formulas are local, a dynamic program can do both computations at the same time, and thus a DP algorithm can be assumed, while it is working on a bag B, to have access to the partition of the portals in the global forest: working bottom up, it makes guesses that will be validated later at a higher level of the tree. Thus, one recurrence is of the form: compute, then memorize a summary of the computation done in G(subtree(B)). The other recurrence is of the form: guess a summary of the future computation, then verify. In the DP the two computations are intertwined.

Borradaile, Klein and myself are using something much like this in a paper on Euclidean Steiner forest. Bateni, Hajiaghayi and Marx are using essentially this in a paper on Steiner forest in planar graphs and graphs of bounded treewidth. Writing up the details of such dynamic programs is a bit messy, yet it is conceptually simple. I think that the difficulty in presentation comes from this mix of bottom-up and top-down.

Are there other examples of such dynamic programs? Are there models of good writing for that?

Friday, December 17, 2010

Writing proofs: avoiding case analysis

As a student I learned a number of ways to prove lemmas: contradiction, induction, case analysis, etc.

Case analysis, when there are more than two cases, tends to make me feel like a mindless computer. It is more like checking than like understanding, and for me it is not a good way to get insight.

The direct proof is my favorite proof organization for short pieces of argument: assumption A implies B, B implies C, C implies D, D implies the final statement E. The proof should give a brief justification for each implication. With this kind of proof it is easier for me to get insight.

For example, in Trevisan's 2005 SDP algorithm for unique games, one small piece of the analysis is proved by doing a separation into three cases.

The claim: if three vectors r,u,v are such that u.v=0, (r-u)2>(r-v)2, and the vectors satisfy a "triangle inequality" constraint, then (r-u)2 > (1/2) (r)2.

The proof has three cases:
- either (Case 1) (u)2 < (1/2) (r)2,
- or (Case 2) (v)2<(1/2)(r)2,
- or (Case 3) (u)2 and (v)2 are both > (1/2) (r)2.
Each case is short, but it feels a little bit redundant and dissatisfying.

Instead, here is a direct proof (how does one do math in html, I wonder.):
- By the triangle inequality constraints,
(A1) (r)2 < (u-v)2+(v)2 and (A2) (r)2<(r-u)2+(u)2.
-Summing, (B) 2 (r)2 < (r-u)2+(r-v)2+(u)2+(v)2.
-Since u.v=0, we have (C) (u)2+(v)2=(u-v)2.
-By the triangle inequality constraints again, (D) (u-v)2<(u-w)2+(w-v)2.
-Substituting that in (B) gives (E) 2 (r)2 < 2(r-u)2 + 2(r-v)2.
-By assumption (r-u)2>(r-v)2, so (F) 2(r)2<4(r-u)2, hence the result.

The proof is essentially the same, but it is organized differently, and I like this presentation better.

Thursday, December 16, 2010

On which topics is there progress in Theoretical Computer Science?

Here are the best papers at STOC and FOCS in the last few years, as best as I could gather. What picture does that paint of research activity in core theoretical computer science? Crypto, approximation, and hardness of approximation are progressing swiftly, but also error-correcting codes. Structural space complexity continues strong. (Some outliers: discrete math, game theory, and the paper on faster integer multiplication.)

Are those the areas where new ideas are emerging, where things are happening, and therefore where one might consider hiring? I'm just considering the core here, not the exploding areas of interdisciplinary research.

ACM STOC 2010
"QIP=PSPACE," and "An Improved LP-based Approximation for Steiner Tree."
ACM STOC 2009
"A Constructive Proof of the Lovasz Local Lemma", and "Public-Key Cryptosystems from the Worst-Case Shortest Vector Problem".
ACM STOC 2008
"Optimal Hierarchical Decompositions for Congestion Minimization in Networks", and "Algorithms and Inapproximability Results for Every CSP?".
ACM STOC 2007
"Faster Integer Multiplication", and "Towards 3-Query Locally Decodable Codes of Subexponential Length".
ACM STOC 2006
"The PCP Theorem via Gap Amplification".
ACM STOC 2005
"Undirected ST-connectivity in log-space".
ACM STOC 2004
"Expander Flows, Geometric Embeddings, and Graph Partitionings", and "Multi-linear formulas for permanent and determinant are of super-polynomial size".

IEEE FOCS 2010
"Subexponential Algorithms for Unique Games and Related problems".
IEEE FOCS 2009
Best Paper Award: none?
IEEE FOCS 2008
"Two Query PCP with Sub-Constant Error"
IEEE FOCS 2007
"Space-Efficient Identity Based Encryption without Pairings"
IEEE FOCS 2006
"Settling the Complexity of 2-Player Nash-Equilibrium"
IEEE FOCS 2005
"The Unique Games Conjecture, Integrality Gap for Cut Problems and Embeddability of Negative Type Metrics into L1", and "Correcting Errors Beyond the Guruswami-Sudan Radius in Polynomial Time".
IEEE FOCS 2004
“Hardness of Approximating the Shortest Vector Problem in Lattices”, and “Cryptography in NC0”.

Wednesday, December 15, 2010

How not to write code

Grading for my freshman programming course, I saw this and similar horrors:

if condition then true else false

instead of

condition

My TAs did not think that should be penalized, "because it's not a big deal". Indeed, I have taught that code ought to be (1) correct, (2) readable, and (3) efficient. The piece of code above is correct; having seen much more mysterious constructions, I cannot argue that it's not readable; and the loss of efficiency is negligible. But it's not elegant! That is just one example of bad style among many. I forgot to stress one important aspect in class: code should always be (4) beautiful!

Perhaps we should have some "beauty points" in our grading scale, and I should have some exercises of the form: take this ugly piece of code, and make it beautiful. Students would complain about possible discrepancies between different TAs' esthetics, of course, but I suspect that there is a hidden yet broad consensus to recognize beauty in code: we know it when we see it... the challenge is to define it!

Similarly, in math, particularly in courses on how to write proofs, perhaps there should be some exercises of the form: "take this ugly proof and make it beautiful".

Tuesday, December 14, 2010

Minnesota

Heard this morning on the bus: "In Minnesota, it gets so cold that when you spit, it is frozen by the time it reaches the ground."

What is an algorithm?

Q: Why does no one implement my algorithms?
A: Because almost invariably there is a step that says: "By exhaustive search over a space S of size some-huge-constant, compute some optimum".

I've been mistaken all along. I should not write "by exhaustive search" but simply "compute some optimum over S". Who am I to tell a programmer that he absolutely must use exhaustive search? If he finds a better way to compute the optimum, so much the better. In the section analyzing running time, I could always state: "Using exhaustive search for example, the optimum over S can trivially be computed in constant time."

The theorem is the interface between me and the complexity theorist (who does not care how the problem is solved algorithmically, but only that the problem is solved and about the related structural insights).

The algorithm is the interface between me and the hypothetical programmer who might some day be interested in implementing my algorithm (and who does not care about why, in the analysis, the algorithm works). That's what should drive what to specify and what not to specify in the description of the algorithm.

If I mix in some critical ideas with some steps where I just threw in the first thing that came to my mind to solve that subproblem, how is he supposed to separate the wheat from the chaff and know which of my instructions should be followed carefully and which ones are open for him to play with and optimize?

Sunday, December 12, 2010

How not to present algorithms

I decided to look at the award-winning Arora-Boaz-Steurer paper giving a subexponential algorithm for unique games. I first wanted to take a look at the algorithm itself, temporarily ignoring the analysis and focusing on the algorithmic techniques only.

According to the plan in the intro, the algorithm is in section 5. There, first, without loss of generality they assume that the graph is regular -- for that, the reader can simply check Appendix A. I go there and review the construction. Going back to section 5, they then make the graph lazy by adding self-loops - I skim the construction.

We are then sent to section 4 and Theorem 4.5. Going there, we are redirected to Theorem 4.1 "using Theorem 3.2 instead of 2.3". Making a mental note of that and going to Theorem 4.1, I see that the proof starts with a Lemma, 4.2, whose proof is a mere call to Theorem 2.3 for some particular value of a parameter. Thus it is time to go to Theorem 3.2. Going there, I am redirected to Theorem 2.3. Going there, I am redirected to Theorem 3.1. Going to its proof, I see that the lemmas are just doing an analysis, without algorithmic contents, until I get to a comment "... moreover, following the proof we see that if the condition is violated we can get a set S ...". Aaargh!

Giving up on that for now and going back to the thread of the proof of Theorem 4.1, I skim the next couple of lemmas, that seem to do some combination of greedy partitioning and of iteratively re-partitioning parts into smaller pieces a few times. Going back to section 5, the next step, in my impression, assigns labels independently for each part of the partition. More precisely, for each label-extended part of the partition, it redirects us to the algorithm of Theorem 2.2. Miraculously, the proof of Theorem 2.2 is basically self-contained: take the subspace spanned by eigenvectors of the large eigenvalues, put an epsilon-net in its unit ball, and, for each vector v in that epsilon-net, use its "large enough" coordinates to define a subset S(v) of vertices of the graph. Going back to section 5, apparently for each set S(v) there will be a mysterious assignment, and the algorithm exhaustively enumerates the v's and picks the assignment satisfying the most constraints.

What have I learned? Basically nothing so far. Frustrating! If the result was not so good, I would give up on that paper. Instead, I may consider setting aside a few days solely for deciphering it...

Monday, December 6, 2010

SDP

Linear programming was in fashion long before approximation algorithms. Linear programming exactly solves all sorts of important problems such as shortest paths, max flow, min cut, various kinds of preemptive scheduling, etc. What about semidefinite-definite programming? What is it good for, other than for approximation algorithms? What problems can be solved exactly by semi-definite programming?

Saturday, December 4, 2010

Email rant

People keep coming up with new, better technology: applications that are smarter at helping you deal with email and make you more efficient. I think that it’s the wrong approach. As soon as that happens, the number of emails immediately increases so that you spend just as much time on it as before, if not more.

I have been thinking about the following anti-technology approaches instead:
- close my mailbox on weekends, so that it won’t accept any email but send an automatic reply “this email is disabled on weekends”.
- automatically dump emails at the end of the day or week. Any email not dealt with on the same day or week that it arrived is destroyed at midnight (or on Friday night).
- have a noise-maker than randomly deletes 20% of the emails before they even get to me.
- put an automatic lag of 24 hours so that emails arrive in my mailbox no sooner than 24 hours after being sent.
- put an automatic cap of, say, 50 emails per day (or however many I can reasonably deal with). Any email beyond that gets automatically rejected with the message “email deleted before reaching the recipient: recipient’s quota reached for the day”.