Wednesday, November 30, 2011

Conferences and Arxiv. Utilitarian values.

Yesterday I took a look at a few abstracts of papers accepted at STACS 2012. I must have been in a contrarian mind, because two remarks came to my mind, both going against my usual sense of what's desirable.

First, I was struck by the existence of prior publications on Arxiv. It is no longer exciting to look at lists of papers accepted at a conference: that's not where we learn about new results, but on Arxiv. I knew that, but this brought it home for me: conferences no longer serve the purpose of quickly communicating the latest research results. That was a major service they were doing, perhaps the main one, and that is no longer the case.

Then, why bother look at such a list at all? Because, in areas outside my own small subarea, I don't follow and can't judge what's appearing on Arxiv, so the conference committees provide a useful filter. Why do I care what happens outside my own small subarea? Truly, I do not care very much, but if a big result comes out, I would like to hear about it. Why? So that I don't sound stupid at parties when people ask me about it; and because of some guilty feeling that I ought to be broader, even if I don't really feel like it. In addition, I am always interested if some research happens that might provide good teaching material. Finally, there is a little bit of curiosity: a good talk given by a good speaker on a good topic can be enjoyable even if it's not my area. (Then, it does not even have to be Theory, and in fact, not even Computer Science.)

Those reasons suggest that I do not need to look at lists of accepted papers other than STOC and FOCS, and that I would benefit from those lists more if they were even more selective. I do not want STOC and FOCS to be more selective, but I personally might appreciate it if they were a two-tier conference.

Second, I have recently criticized Oded Goldreich's latest essay on Luca Trevisan's blog, but looking at the abstracts of papers accepted at STACS, I could not but recognize the turns of phrase that he lamented. Consider, for example, the paper I mentioned yesterday. Here is the full abstract: The Travelling Salesman Problem is one the most fundamental and most studied problems in approximation algorithms. For more than 30 years, the best algorithm known for general metrics has been Christofides's algorithm with approximation factor of 3/2, even though the so-called Held-Karp LP relaxation of the problem is conjectured to have the integrality gap of only 4/3. Very recently, significant progress has been made for the important special case of graphic metrics, first by Oveis Gharan et al., and then by Momke and Svensson. In this paper, we provide an improved analysis for the approach introduced by Momke and Svensson yielding a bound of 13/9 on the approximation factor, as well as a bound of 19/12+epsilon for any epsilon>0 for a more general Travelling Salesman Path Problem in graphic metrics.

Before I read his essay, I would not have noticed anything special about that abstract. But now, I can see the choice of arguments. Except for the first sentence of the abstract, everything else reads like a list of Olympic race athletic records - an empty race for ever-better approximation factors, with no concern for the underlying structure. The approximation factor has become, not just a (key) indication of the understanding of the problem, but a goal in itself. As I read that, I was a little bit frustrated: I wanted a hint of how the author got that improvement to an important result! What did they learn about the problem, that Moemke and Svensson did not know?

Then I remembered Goldreich's essay.

Strike one for utilitarian instrumental values.

Update: Just to make it clearer. I like the result, I think it's worthwhile, I am interested, I would have been happy to do that research and be an author of that paper. My only quibble is with the style of the abstract.

Tuesday, November 29, 2011

STACS 2012 list of accepted papers

The STACS 2012 accepted papers are out. STACS is in Paris and starts on February 29.

The most noteworthy article for my taste, at first glance, is the 13/9-approximation for Graphic TSP by Marcin Mucha, available here. (In some sense this is old news: the paper will be presented at STACS, but was published on Arxiv in August.) They improve the analysis of the Moemke-Svensson paper from last Spring (presented last month at FOCS; was published on Arxiv in April).

Here's a quote: We present an improved analysis of the cost of the circulation used in the construction of the TSP tour/path. [...] The circulation used [by Moemke and Svensson] consists of two parts: the ”core” part based on an extreme optimal solution to the Held-Karp relaxation of TSP, and the ”correction” part that adds enough flow to the core part to make it feasible. We improve bounds on costs of both part; in particular we show that the second part is in a sense free. As for the first part, similarly to the original proof of Moemke and Svensson, our proof exploits its knapsack-like structure. However, we use the 2-dimensional knapsack problem in our analysis, instead of the standard knapsack problem.

Monday, November 28, 2011

Who is in the computer science building at 11pm?

Today I came by the computer science building at Brown around 11pm. There were a dozen undergraduate students in the atrium, taking a short break from work or eating a late dinner, and each of the computer labs had about 10 undergraduate students working.

That's a lot of students.

They work there. They study there. They debug there. They go to class there. The eat there. Sometimes they take a nap on a couch there. That's the computer science subculture at its most extreme!

Maybe one of the classrooms should be converted into a dorm with bunk beds, Appalachian-hut style. That would save the students some money.

Sunday, November 27, 2011

The Khan academy on the foundations of computational complexity

The other day i watched some Khan academy videos about substraction and about multiplication.

The ones about subtraction said something along the following lines:
"How do you compute 5-3? Here is one way. Say that you have 5 lemons. Here I am drawing 5 lemons. Minus 3 means that you remove 3 lemons. Here I cross them out. How many are left? Let me count: 1,2. There are two lemons left, so 5-3=2." After presenting some other methods based on the number line, he then moves on to the next step: "How do you compute 67 - 34? You could do the same method as before: draw 67 lemons, cross out 34 lemons, and count how many lemons are left. That would work. It would give you the correct answer, but it would take a really long time. So, now I am going to give you a way to do it faster." He then moves on to explaining how to substract the ones digits and the tens digits separately.

The one about multiplications went along the following lines:
"How do you compute 4*3? Here is one way. 3 is 3 cherries. Let me draw 3 cherries on a row. I do this 4 times, so I draw 4 rows that each have 3 cherries. Then I can count: 3+3+3+3 equals 12, so 4*3=12." After presenting equivalent methods, he then moves on: "But what happens if your numbers are really big? For example, if you have 100 lemons, drawing 100 lemons on a row would take a really long time. And if you wanted to compute 100*100 with this method, you'd have to draw 100 rows that each have 100 lemons, and then add them all up: that would take forever. Here, I am going to show you a way to do it fast, even with really big numbers."

One might say: what's the big deal? Of course, when I teach the multiplication algorithm in class, I remark that doing the computation in unary is inefficient, but I always thought: "Why bother mentioning that? Who in their right mind would work in unary?" it hadn't crossed my mind before that there was a natural reason to bring that up, and so my remark goes by unnoticed.

The narrative in those videos is amazing. I am so impressed with this man! He is laying the foundations of asymptotic analysis of algorithms, explained to first graders.

Friday, November 25, 2011

Teaching deterministic linear time median

When I teach Algorithms, one challenging lecture is the one where I present the deterministic linear-time algorithm for median-finding. The big challenge is: How can we present the algorithmic design so that it is natural? I had an idea: I could simply watch online videos and see how others do it! So, I found and watched videos of lectures at Stanford (by Tim Roughgarden) and at MIT (by Erik Demaine) on that topic. For a teacher, it is fascinating to watch how others go about teaching the same material as you!

I have picked up a couple of ideas from Erik's lecture.

First, the runtime analysis involves some quantities with floors and ceilings. I always sweep the floors and ceilings under the carpet (with a vacuum cleaner). Erik, on the other hand, gave a short, relatively polished analysis, that managed to take those into account rigorously without making a mess of the analysis (well, except that he is assuming without saying it that T(n) is a monotone function of n, but none of the students noticed that gap in the proof). I may try to copy his approach when I get to that point.

Second, like me, in the design phase Erik highlights the use of transitivity of ordering: if a < b and b < c then a < c. I realized while watching him that I could highlight that already when I teach Quicksort: why does Quicksort give a correct output? Because of transitivity: if a < pivot and pivot < b, then a < b. That explains why we are doing the correct thing when in Quicksort we commit to placing a to the left of b. For Quicksort, that part of correctness is obvious and usually goes without saying. For linear-time medians, it comes up much less naturally and is actually a "creative" step. But if I have taken a minute to bring it out while teaching Quicksort, then it will seem more natural when I bring it out while teaching linear-time median.

So, watching those videos was mildly useful. Unfortunately I did not find what I was looking for: the challenge of presenting the algorithm in a natural manner is still open, as far as I know.

Thursday, November 24, 2011

The Khan academy on insertion sort

The Khan academy puts 8-minute youtube video lectures (and a few exercises) on the internet to teach all sorts of topics in an online fashion. I just realized that there are some lectures on computer science, and I have watched a few with great interest. I was mesmerized by the lectures on insertion sort - I taught that in class just a few weeks ago. My thinking was: could I borrow some ideas from that renowned teaching style?

The instructor is careful, thorough, and incredibly patient. He follows a well-trodden path to communicate, but does not announce signposts explicitly as a general paradigm for problem-solving.

Signpost 1. Define the problem: he does that in a very concrete way, using an example. For sorting, there is only one possible ambiguity in the problem definition: what about duplicates? He deals with that without explicitly drawing the student's attention to it, just by checking his program with an example that has duplicates.

Although I would also use an example, I am more explicit and state the problem definition in terms of a pair (input, output). Why? Because I don't want students to only learn about sorting: I also want to inculcate the notion of computational problem, that they will see later in many more advanced Theory courses.

Signpost 2. Explain the algorithm by executing it slowly on an example.

I would also do that if it was the beginning of the semester, but in addition to what he wrote on his electronic blackboard, I would write down comparisons as I perform them: instead of just saying "we compare 3 to 7", I would write down "3<7" on the side. Why? Because I will at some later point want to analyze the runtime, and for that, I might want to focus on the number of comparisons: highlighting the comparisons is a good preliminary perspective that will come in handy later.

One minor thing I really did not like was that he kept calling elements "the zeroth element" for the first element, "the first element" for the second element, and so on. I am resolutely opposed to molding the English language after conventions used in programming languages.

One major thing is that the instructor explains how the algorithm works, but not how one would come up with that algorithm in the first place. In his place, I might try to ask a leading question to get my unknown audience to come up with the algorithm. Why? Because I try to teach, not so much specific algorithms, but how to design algorithms. For example: "One way to go about it could be called "sort as you go": scan the entire list, and, at every step of the scan, try to make sure that everything you have scanned so far is sorted." Then I can show intermediate steps in the execution - the desired state of the list after processing the first item, the first two items, the first three items, etc. And only then would I discuss how to fill in by processing each additional item scanned. That's a modular approach, and hopefully after I have stated the "sort as you go" idea, many students can figure out the rest of the algorithm. It would, maybe, stimulate the students' creativity a little bit more.

Signpost 3. Write code for insertion sort in Python.

Before that, in his place I would first have written some high-level pseudo-code. Why? As a preparation, I suppose, for more difficult problems, settings when it is imperative to lay out the algorithm before starting to code.

The way in which he explained his code was slow and very intuitive. However, I noticed that he wrote and explained the code in a very linear fashion, line by line. That's not the way we do it: we always go for a modular presentation, starting with incomplete code that contains some "..." in places, and filling in the details during a second pass. For insertion sort, the difference is not huge, but that bothered me slightly.

He then ran it on an example in the end, as a way to implicitly suggest the need for testing. I would have been more explicit about that need.

Signpost 4.Execute the code step-by-step on a small example.

I skipped much of that presentation, but he did exactly what I do in class after I have presented an algorithm, when I am not 100 percent sure that the students understood it. Fortunately it's automated in our scheme (Racket) environment: one of the options when running a procedure in our programming environment is to do it not all at once but step by step, with highlights showing what parts of the code are being executed and what the current values of interest are.

Signpost 5. Do some small code optimization.

I really liked that short explanation. He replaced "While A: if B: ... else break" by "While (A and not B) ...". That was quite natural, and his last few words indicated that it was a general pattern. I had never thought it through quite so explicitly, so I was happy to learn that.

No video about runtime, but maybe that'll come some day.

My main reservation was that he taught one particular algorithm, but did not take that opportunity to teach how to design algorithms. One could present insertion sort as an example of problem-solving by reducing to an easier subproblem (namely, inserting a single element in an already sorted list). Making that explicit introduces the idea of reductions.

Overall, if it is a way to teach insertion sort for the sake of teaching insertion sort, it is great. If it is a way to introduce students to algorithmic design through the example of insertion sort as a stepping stone, it is less impressive. Obviously, he is targeting a lower level than my students, but I would still have liked him to embed some deeper level of understanding in his presentation (even if it's done in an implicit manner), like sowing seeds of knowledge, just to prepare students, so that, when they meet more advanced material, that material will seem natural to them. They would be ready without even knowing that they have been prepared for it!

In short: very pedagogical, very engaging, superbly clear, but there is a little bit of a lack of vision.

Wednesday, November 23, 2011

Which textbook to use when teaching Algorithms?

Which textbook should one use when teaching Algorithms?

Let me start with the winner. For the past couple of years I have been using the textbook written by Dasgupta, Papadimitriou and Vazirani. Here is what I think of it.

-Its first quality is that it is softcover and therefore relatively inexpensive. Some preliminary version is available for free on the authors' web pages.
-Its second quality is that it is relatively small and therefore light to carry around.
-Its third quality is that it is written in good English. The style is crisp, and the authors occasionally inject emotions in their descriptions, giving more texture to the book. "Now the time has come to meet the quest’s most embarrassing and persistent failures. "
-Its fourth quality is that for each problem, the textbook goes straight to the heart of the difficulty. It does not waste space on secondary notes, avoids redundancy, and gives the gist of an argument in just a few lines. The proofs are often beautiful because they are direct and minimal, with no unnecessary detours. They are a pleasure to read.
-Its fifth quality is that it is short enough that a student could aim to read it in its entirety.

However, the book is not perfect.
-Its first fault is that it contains mistakes. The authors obviously did not implement their algorithms. Special cases may be overlooked in the problem specifications. Base cases may be done in a hurry, or barely mentioned when they contain nothing of interest. Calculations can be slightly off where the constants are concerned. The authors are keen on communicating the central ideas, but there is a certain carelessness about details. In that sense, the presentation is not up to the standards that we expect from our own students.
-Its second fault is a lack of examples. For me, designing algorithms is first done by working through examples, and a student who wishes to study the course material has to supplement the book with more examples. In that case, the tradeoff between size of book and content comes out in favor of examples, in my opinion.
-Its third fault is the choice of topics with which to start the book: the first chapter talks about arithmetic operations. That's natural because students learned algorithms for those at a young age, but that immediately raises problems that should be swept under the carpet until much later: when we talk about an input of size n, what is n? Starting by numbers creates problems that spoil the beginning of the semester: when we sort by comparisons, why do we assume that comparing two numbers can be done in constant time? What if the numbers are really large? Those concerns are valid, but they should come up at the end of the semester, when the student is comfortable with runtime analysis and ready to revisit his assumptions. This textbook's approach naturally leads students to raise those questions early on, and at that stage they only create confusion.
-Its fourth fault is a lack of exercises. There are not enough exercises for students to practice what they are learning.
-Its fifth fault is the choice of topics to be covered, that does not correspond one hundred per cent to my own taste; it is more of a reflection of the authors' tastes; however that is natural for any book.

Overall, I like this book very much. It is the one that comes closest to my taste. A cleaned up and slightly expanded version of that book would be ideal for me.

Tuesday, November 22, 2011

Workers' rights, students' training

Thanksgiving is the most respected holiday in the US. It is the one day on which every non-vital service is closed in the afternoon. It is the main family holiday in this country. For many people it is part of a 4-day weekend from Thursday to Sunday. The next day is traditionally devoted to shopping at heavily discounted prices. Because of the crowds in the stores, it is called Black Friday.

Some retail store employees have complained that for Black Friday (the day of frenetic shopping that follows Thanksgiving Thursday), they are required to start a full day's work beginning at 11pm on Thursday evening. Indeed, stores open as early as 1am on Friday morning.

I have been shocked to see the public reaction to employees' complaints. A typical reaction is: "If they don't like it, they can quit, and there will be plenty of people ready to take their job!". In other words, the market economy rules. It dictates what's right. At a time of high unemployment, workers have no rights. This is one of the situations when, in spite of having lived here for many years, I still feel that there is something alien about this country.

Sometimes I wonder if the students' training is a little bit similar. They cram a perhaps excessive workload, then they become highly marketable and get good jobs: not just because they know a lot and are well trained, but also because they have demonstrated their compliance with the demands made of them, even if those demands are unreasonable. They'll be good software engineers, always willing to put in long hours without counting, and the occasional all-nighter when a deadline is coming up. They are being trained for a way of life in which, in exchange for a good salary, they will basically belong to their employer.

Monday, November 21, 2011

How to make students fail

Here is a simple recipe to make students fail: create assignments densely packed throughout the quarter, and where each assignment relies on the previous one completely, so that it requires prior completion of the previous assignments before it can be started.

Then, consider the case of a student who has a bit of trouble with the course and who can only manage to do 90% of the work assignment in a given week.
On the first week, he will turn in 90% of the work.
On the second week, he will first need to complete the remaining 10% of the first assignment (for no credit), so he will only be able to turn in 80% of the work.
On the third week, he will first need to complete the remaining 20% of the second assignment (for no credit), so he will only be able to turn in 70% of the work.
and so on until the 10th week, when he will first need to complete the remaining 90% of the previous assignment (for no credit) and will then have no time left to work on the 10th assignment.
On average, for a 10-week quarter, his grade will be 50%, or a bit less if end-of-quarter assignments are weighed more heavily; that, in spite of having done 90% of the work overall.
Note that giving the option of turning in assignments a couple of days late does not help. The student is doomed.

Now, consider a student who only just manages to do the entire assignment in the time allocated, and imagine that after three or four weeks, there is one week when he is sick, or tired, or something, and only does 50% of the work.
Then, from that point on, each week he'll spend half the time catching up on the previous assignment and half the time on the current assignment, so all of his grades will be 50% from then on, and at the end of the semester his average grade will be 60% or so.

Saturday, November 19, 2011

Two goals of teaching

There are two goals accomplished by teachings university level students.

One is to educate the people who will, in a few years, be professors in academia, or entrepreneurs in innovative businesses, or CEOs of large companies whose decisions have significant impact on society. The idea of shaping the knowledge of tomorrow's leaders was very present when I taught at Ecole Polytechnique in France.

The other one is to educate people in general, forming the minds of our the new adult citizens who will be voting and holding jobs everywhere in society. That idea was very present when I taught at Universite Paris-Sud in France.

For the first goal, the focus of attention are the best students in the class. The instructor tries to constantly challenge them, stretching their mental agility and pushing them to learn as much, as deeply, and as fast as possible. The students who are in difficulty are not on the radar. For the second goal, the focus of attention is the majority. The instructor tries to reach as many students as possible and is focused on the fringe students who are struggling to learn the course material. The students who are effortlessly cruising through the course are mostly ignored and are sometimes even considered to have an unfair advantage. So, to some extent the two goals are in tension.

It's one of the difficulties of teaching at a large public university. For example, if you teach a class of 100 undergraduates at UC Berkeley: are you going to spend effort on the two or three students who are shining in class and who might, with encouragement, continue on to a PhD, or on the thirty students who seem to have overreached slightly by taking the course?

At a selective institution such as Ecole Polytechnique or Stanford, the problem is largely solved by the filtering that happens beforehand.

Teaching with both goals in mind is taken to its limit this semester Stanford, with an online course being taught on introduction to AI. The course material is taught to the Stanford students, a selective group if there ever was one, but also, in an online manner (including automatic online grading of homeworks and exams), to the entire world, with no filtering whatsoever. How can that be?

There must some self-selection, so that the unknown crowd of students taking the course online are each, individually, a student who is highly motivated, smart, hard-working, and who, in different circumstances, might have ended up in Stanford anyway. It's an idea very much aligned with the American Dream: offering an opportunity for all to better their education, regardless of social class or circumstances of birth. That makes the project an idealistic and, if even half-way successful, exciting endeavor.

Friday, November 18, 2011

How to solve a system of linear equations

What could be more boring than solving a system of linear equations, Ax=b? We've all learnt in school how to do it: Gaussian elimination; and we've practiced it until we got sick of it. End of story!

But there are people who are never happy with what they have. Gaussian elimination is too slow, they whine. Polynomial time is not good enough for them. They are afraid that it will fail on massive data, and want something that is fast "for real". Near-linear time is the goal these days.

They currently focus on a special case: "SDD solvers", meaning that the matrix A is symmetric and diagonally dominant, meaning that the value on the diagonal is larger than the sum of absolute values of all off-diagonal values on its row. In fact, let us simplify things further and assume that A is the "Laplacian matrix" of a graph - the matrix where rows and columns are indexed by V, where the diagonal entry is the vertex degree, and where the off-diagonal entry is -1 if and only if there is an edge between the two vertices.

So, for that special case, in their anxious search for faster algorithms, people have started playing a game that is the opposite of what we traditionally do in approximation algorithms: instead of giving highest priority to the accuracy of the output, and treating runtime as a second thought, they sacrifice some accuracy in the hope if getting a faster algorithm. Indeed, if we don't insist on finding the exact solution x* of Ax=b, then we can try some iterative method that constructs a sequence x1,x2,x3,., where each xi is an improvement over the previous one, and pray that it gets quickly to a value xT close to x*.

But scientists don't trust the power of prayer. Instead, we analyze the runtime, which is basically the cost of each iteration times the number of iterations. Unfortunately, if the matrix A is dense (many non-zero entries), each iteration is costly; and if it's not well-rounded, one needs a large number of iterations to get close to x*.

To speedup the algorithm, instead of solving the system Ax=b, let us solve the equivalent system

for a well-chosen matrix B. Namely, B should be sparse, so that each iteration is cheap, and B should be similar to A, so that few iterations suffice to converge. (What's the measure of similarity? It's k such that for any vector x, x^TBx is between x^TAx and k times that quantity.)

Fortunately, there is an algorithm to find a matrix B that is very sparse- only a linear number of non-zero entries - and very well conditioned - k is constant. Unfortunately, that algorithm is slow, which defeats the whole purpose. So, how can one go about quickly finding a good matrix B?

Here's an algorithm for that subproblem.

1. First, find a low-stretch spanning tree T of G, the graph corresponding to A: it's a spanning tree such that the sum of all interpoint distances in T is not much more than the sum of all interpoint distances in G. That's been studied before in a sequence of papers, so it's just a matter of recycling old work.

2. Second, for every edge uv of G, define P(uv) as the distance from u to v in T. Let t be the sum of P(e) over all edges e of G.

3. Third, build a multigraph H by repeating t log(n) times:
pick an edge e with probability P(e)/t, and put t/P(e) copies of e in H.

4. The desired matrix B is (1/t log(n)) times the Laplacian matrix of H.

And that is what I learned because of a seminar given at Brown by Richard Peng, who presented some more advanced stuff in the same area (some work he's done with Ioannis Koutis and Gary Miller).

Why should we care about this line of research? Getting an improved polynomial-time algorithm for a special case of linear system solvers, is that really such a big deal? Why is there so much interest for those papers in the Theory community? Pick your answer:

( a ) . we should care, because Dan Spielman (or insert the name of your favorite researcher in that area) cares, and he has good taste. [Trust of authorities]
( b ) . we should care, because the algorithmic perspective on those problems is pretty new. It's the development of a new technique. [Algorithmic design]
( c ) . we should care, because this has been implemented and tested and it is claimed to be competitive with existing solvers, so it leads, maybe, to actual real life programs that run faster. This could be a big success story of Theory! [Applications to Applications]
( d ) . we should care, because it has implications for Theory: it leads to faster algorithms for graph problems such as generating a random spanning tree or maximum flow. [Applications to Theory]
( e ) . we should care, because it's sunny out, I am in a good mood, and today I am inclined to see things as exciting in general. [Random]

( a' ) . we should ignore this, because it has too much heavy math, it's not fun, and it's not my taste. [Trust of own taste]
( b' ) . we should ignore this, because the core of it is just low-stretch algorithms, sampling, etc: it's just putting together basic algorithmic tools that have nothing new about them, and the claim of novelty are pure marketing [Disgruntled]
( c' ) . we should ignore this, because the claims of applications are merely what people always say in their introductions, and those claims never mean anything. [Skepticism]
( d' ) . we should ignore this, because it's not going to help understand the P versus NP problem, even in the long term. [Lack of applications to Theory]
( e' ) . we should ignore this, because I had too much to eat and now I have a stomachache. [Random]

Thursday, November 17, 2011

Confusing maps

Yesterday I went to the MFA in Boston, and found my way around the museum with the map. The architecture is a little bit complicated: for example, because of some atriums (atria?) that span several levels, sometimes the best way to go from one room to another room on the same floor is to go down several levels, across the atrium, and back up several levels. That makes the map design challenging. (In particular is it not clear when what appears to be a "room" on the map actually corresponds to some empty air floating above a floor at a lower level.)

As discussed in a previous thread, to help the visitors orient themselves quickly, they partition each floor into sections (clusters of rooms) that have different colors, depending on the theme of the art displayed. But to my dismay, the correspondence is not a bijection: although the colors correspond exactly to types of art, art of the same type may be displayed on several floors.

As a result, before I started my visit, I had to spend a good while pouring over the description of the rooms on one page, and the map on another page. Very inefficient! And it is not until after I had left that I noticed that they had highlighted a few particular pieces of art as particularly noteworthy: I was so busy trying to understand their visualization system that I did not even notice those marks.

All in all, a less than optimal example of data visualization.

Wednesday, November 16, 2011


The other day a friend was asking: "which carrot should I take, this one or that one?", showing us two carrots of different shapes.

The other person present and myself simultaneously answered: "This one!", pointing to the same carrot. So my friend picked that carrot, commenting: "Wow, you guys made exactly the same choice. Amazing!", to which I replied ironically: "Yes, amazing! What are the odds?"

Compared to most people, it's hard to get me excited about chance events. In particular a 1-in-2 chance event doesn't get me to even raise an eyebrow. Here's my scale:

1 in 10: raised an eyebrow
1 in 100: quite surprised
1 in 1000: skeptical, then amazed
1 in 10000: skeptical, then still skeptical (in spite of evidence).
1 in 100000: impossible. Not even worth refuting.

Tuesday, November 15, 2011

Chance encounter: what are the odds?

Through Amazon, I bought a used copy of the book "Fear and Trembling" by Kierkegaard, in an edition that had been recommended to me by a friend (Translator of that edition: Walter Lowrie). When we met, her eyes fell on the name on the first page of my copy of the book: "Lois!" She exclaimed. "Where do you know her from?" - but I had no clue who that Lois person was. I said: "I don't know. It's just a name. Some previous owner of my book, obviously." As it turned out, she was my friend's fellow student in college many years ago. They were both in a course in which they studied Kierkegaard.

My friend was amazed at the coincidence. But how amazing was it really? What are the odds?

One might imagine that reading Kierkegaard is so rare that all Kierkegaard readers are at distance 2 from one another, in a tiny-world phenomenon. But that's not true. A few weeks ago a random colleague of mine came to my house, saw that book and started discussing it: that suggests an abundance of Kierkegaard readers.

Imagine that there are 6000 copies of that edition floating around the US. About 30 students took that course with my friend. The odds that I would end up with that book would still be 30/6000 = 1/200. That seems about right: low enough to be surprised by the coincidence, but not so low as to be in disbelief. Now the only information I need, to check my calculation, the the number of print copies of that book.

And that's where search engines failed me, to my surprise. How many copies of a book have been printed or have sold? That information is apparently not freely accessible. So I am left to my guesses about how amazed I should be.

Monday, November 14, 2011


Greetings are one situation where cultural differences are most visible. Once I introduced myself to an Asian student, and he surprised me by bowing to me as we were shaking hands. Bowing! To me!! I was... rather flattered.

To learn proper greetings, I take my cue from my environment, learning the proper protocol by following the example of others. I try to do it right, but it's not always easy. Once in the jungle in the Golden Triangle of Thailand, some children in a ditch saw me walking by and did the namaste gesture, accompanied by a slight bow. Wanting to fit in, I reciprocated the gesture and slight bow. But, to my confusion, the kids then burst out laughing!

Having learned from that incident, when that Asian student greeted me with a bow I was careful not to bow back.

Now, next month, Andy Yao is coming to give a talk at Brown. Should I bow when I greet him? Would that be nice, or would it be misplaced? I have no idea.

Sunday, November 13, 2011

Journals, Conferences, Arxiv: my solution.

There has recently been a flurry of posts in the blog world discussing the publication system.

Journals: articles take a long time to publish. Reviewers take forever to review. They don't do a good job. They're picky, and they lack good taste. Not that it matters, because people don't read journal articles anyway. Selective journals are biased for or against one fashionable area or another. Less selective journals drown us with too many articles to choose from. They're also expensive. It's a rip off.

Conferences: articles are half-baked. Reviewers don't read past the introduction. Committees make choices based on author reputation and fashionable keywords. The system is rigged. They accept too few papers, so researchers can't have access to that means of making their work known. They accept too many papers, so researchers drown in an excessive quantity of results being output by the community. The quality is low. They lack depth, accepting flashy results with big words and no substance. They lack breadth, accepting results in areas that are currently the fad and ignoring everything else. They're also expensive. It's a rip off.

Arxiv: an Arxiv paper is not a publication. It's not properly vetted, it's not refereed, we don't know whether they'll be around in one hundred years, there are way too many reports being published there, it doesn't cover everything and some researchers are still resisting to put their results on the arxiv, there is no guarantee that the results are correct or interesting, it's a way for people to claim a result with a poorly written manuscript, it impedes research by preventing more serious researchers from doing a more thorough job.

So many problems! So many complaints! What a dark age we live in! Now, it is my turn to put in my two cents.

My solution: I propose a simple and effective solution, namely: do not publish your work.

What could be simpler to implement? There are many advantages to not publishing. Just consider: If you do not publish, you do not waste other people's valuable time by forcing them to referee your papers, you do not add to the information overload by adding your own stuff on arxiv. You are performing a public service by not publishing. It also frees your own time by giving you the liberty to turn down refereeing requests, so you'll have more time to do better research.

But then, one might ask: how are your results going to get disseminated? First, most results don't matter, so no one will notice the difference. Second, if a result does matter, the odds are that several other people will also find it at the same time, so the result emerges naturally from the research community, without you having to go through the pain of publishing. Third, if by some extraordinary chance a result matters and you're the only one who truly understands it, then, as you give talks about it, it will catch on. Eventually someone else will be interested enough to think and write about it - here I'm thinking, for example, of part of Thurston's paper entitled "Conway's tiling groups".

That leads me to the second part of my proposed solution: publish other people's work. By that, of course, I do not mean stealing other people's ideas or results. Rather, what I mean is: when you hear about something that gets you excited, something that you want to understand badly enough that you are willing to spend many hours studying it, something that makes you so happy that you want to share the joy by communicating what you learned to others around you, then it is time to write an article about it.

As to your own research, leave it for others to write about if anyone cares to do it.

Saturday, November 12, 2011

What is Theoretical Computer Science?

Some people study Number Theory. Others study the Geometry of Manifolds. In TCS, we study the Power of Computation. We prove theorems, just like in any other pure science. What can be computed, how well, how fast, with what techniques, and what are the limitations? In the same way that number theorists were long fixated on Fermat's conjecture, our eyes are fixed on the distant P versus NP problem. Instances of applied problems are a chance for us to test our understanding of the hidden structure. "Reality check" means an example against which to confront our ideas on computation. Whenever a problem is hard in theory but easy in practice, or vice-versa, it's an invitation to revisit our assumptions and refine our models.

The real world is useful, as a tool towards developing a better theory.

Friday, November 11, 2011

Automatic spelling correctors

My MacBook Air has an automatic spelling corrector. I haven't yet looked at how to turn it off, but what I can say is, how incredibly annoying!

At first I thought those typos were my fault. I started worrying about my mental state, watching more carefully as I typed, and realized what was going on. Now I keep wanting to proof-read my texts, just to check one more time that they didn't get modified behind my back - a mild case of paranoia. A minor typo on my part, "fixed" by the automatic corrector, can lead to something unrecognizable. For example, the word "poem", that I accidentally typed as "poen" - a word for which the reader can easily substitute what I originally intended - found itself transformed into "open", and the reader will now be scratching their head (at least it didn't become "porn"!).

How anyone would be willing to suffer such intrusion on our writing, I can't imagine. I will have to poke around to find how to fix this intolerable feature of my new laptop.

Wednesday, November 9, 2011

On writing recommendation letters

We are entering the season of recommendation letters. If the applicant makes it to the final stage, the letters are scrutinized and every word said or unsaid gets interpreted. How does one go about writing a recommendation letter? I am not very good at it, but here is my usual template:

Initial draft

In the first sentence, I say where I know the person from and how long I have known them. Then I summarize our joint work, if there is any.

If it is about a student who took my course, in the second paragraph I recall his or her academic performance, highlighting if there was anything special, any interesting remark that came up in class for example.

If it is about someone who I did research with, in the second and third paragraphs I talk about the technical content of our work, giving my own assessment of how deep or interesting the results are, and mentioning any particular role that might have been played by the collaborator. I make sure to insert some reservation of some kind, because no one is perfect, and a letter is not believable if it does not contain a negative. The compliments are more compelling if there is something to balance them. If necessary, I add a comparison to specific people who have applied for similar positions.

If it is an application for a position that has some teaching, I add a sentence or two about teaching.

Then I insert a few words about the person's personality.

In my final sentence, I give my overall opinion, which I try to make as multidimensional as I can, so that each person is best possible in his or her own dimension and will be sought after by the places that are looking for exactly this kind of person.


After that initial draft, depending on how much time I have, I polish the letter: I reconsider the choice of words in the final sentence. I look up some research papers or documents to refresh my mind on the most interesting features of the work, and I develop the technical paragraphs a little bit. I search for adjectives that are an evaluation, and look for errors in emphasis. I remove expressions such as "it seems that", "I believe that", etc. Writing "this work is deep" is much more forceful than "I believe that this work is deep". Obviously, if I write it, it's because I believe it, so why add "I believe"? Pointless. I look at the relative balance between the various parts. For example, it's not good to be too effusive about somebody's personality, because then in comparison it makes the rest of the letter look bad. It's better to put the reservation, whatever it may be, in the middle of the technical evaluation rather than first or last. I may ask a colleague to take a look and tell me the impression conveyed by my letter. Sometimes it is quite different from what I had intended! Sometimes I may add some comment about the place where the applicant is applying, and the suitability of the place for the person or vice versa (but that has to be tailored to each place, so mostly I do not bother). The one thing I never do is polish the style of the English to remove ambiguous antecedents, to correct mismatches in number, or to fix faulty co-references.

Good letters

The most important thing about writing a good recommendation letter has nothing to do with the applicant: it is that I have to be in a good mood. If I am in a good mood, good words will come to my mind, upbeat expressions, accurate but positive turns of phrase.

The next most important thing also has nothing to do with the applicant: it is that good letters take time to write. (At least an hour, but easily half a day for more senior applicants.) Usually, the first time I write a letter for someone, I am pressed for time and it is not that great; by the second or third iteration I have had some time to polish it, and it is a much better description of the person's strengths.

Spanning trees with few hops

Minimum spanning tree problems often become NP-hard when one introduces side constraints. One popular such constraint is to limit the depth of the tree - the maximum number of "hops" when traversing a path from the root to a leaf in the tree. That's the k-hops minimum spanning tree problem, supposedly motivated by network applications, because delays along a path are roughly proportional to the number of hops.

In this problem, unlike the minimum spanning tree problem, the input edge weights do not necessarily define a metric. Indeed, in the minimum spanning tree problem, we usually start by saying: "Without loss of generality we can assume that the triangular inequality holds; otherwise, replace costly edges by shortest paths." But in the k-hop minimum spanning tree problem, such replacements may be invalid because they increase the number of hops.

When the input weights do not define a metric, known algorithms are quite poor. If there exists a tree with k hops and weight w, then one can produce a tree with O(k log n) hops and weight O(w log n), but doing better is impossible.

However, the situation is much brighter with random inputs drawn from a nice distribution. In a 2008 arxiv paper by Angel, Flaxman and Wilson, one assumes that the input is a complete graph and that the edges are drawn independently from the same distribution, for example the uniform U[0,1] distribution, or an exponential, or some other reasonable distribution (i.e. with similar density function near zero). Then there is a simple algorithm to produce a near-optimal tree.

Here is the algorithm.

First, assume that the bound k on the number of hops is less than log log n. Then the algorithm guesses x1, x2,..., xk, where xi is the number of nodes that are i hops away from the root. Then it performs a greedy Prim-like algorithm, going level by level, where, to build level i, you pick the xi nodes that can most cheaply be connected to the nodes in the part of the tree already built. (The "guess" relies on probabilistic analysis to compute the optimal sequence x(i)).

Second, assume that the bound k on the number of hops is more than log log n. Then the algorithm builds an (unconstrained) minimum spanning tree, then greedily breaks it up into small-depth clusters, then reconnects the clusters using a straightforward extension of the first algorithm, (where the goal is now not to connect the entire graph but to create a small depth subtree intersecting all the clusters).

Although the independent weight assumption is quite restrictive, the algorithm itself is natural. In fact, the unexpected bonus is that when k>log log n, not only is the output tree near-optimal, but its cost is almost the same as the cost of the unconstrained MST: in other words, one get tree of depth O(log log n) at no extra cost. So one is left to wonder: how good is this algorithm in practice?

Tuesday, November 8, 2011

Models for discrete metric spaces

It is good to have random models to test algorithms.

For graph algorithms, the G(n,p) model is like an old friend. For network algorithms, the analog is the complete graph with i.i.d. exponentially distributed edge weights. For algorithms on discrete metric spaces, what's a good model for a random discrete metric space?

Is there a model such that the marginal distribution of the distance between any pair of vertices has an exponential distribution? If so, then what would be a maximum entropy distribution that would have those marginals?

Monday, November 7, 2011

Hierarchically well-separated trees, bottom-up

Recently I taught the FRT algorithm to approximate any metric by a hierarchically well separated tree metric (HST) with edge lengths that are powers of 2. The construction is top-down. Initially all vertices are together in one big cluster, and later the clusters are progressively subdivided using smaller and smaller balls. A question came up: is it possible to construct the tree metric bottom-up?

Here is an easy way to simulate FRT going roughly bottom-up, described in the papers "Network Warehouses: Efficient Information Distribution to Mobile Users" (Motskin, Downes, Kusy, Gnawali, Guibas) and "Distributed Resource Management and Matching in Sensor Networks" (Gao, Guibas, Milosavljevic, Zhou).

Pick a random priority ordering over the vertices, and a random scalar r in [1,2]. For each vertex v in parallel and for each distance d which is a power of 2 times r, find the highest priority vertex u such that B(u,d) contains v. As the power of 2 increases, this defines a sequence of balls containing v, of larger and larger radius, and with centers that have higher and higher priority.

To compute the distance between v and w in the tree metric, find the maximum d such that, in their respective sequences, v and w are in the same ball of radius d but in different balls of radius d/2. That determines the place in the tree where v and w have their lowest common ancestor x, corresponding to the ball of radius d that contains both v and w. Then the x-to-v and the x-to-w paths in the tree both have edges of lengths d/2, d/4, d/8, ..., so the distance from v to w in the tree metric is 2d.

Here is a question: is it possible to construct a hierarchical clustering that has the properties of HSTs, but that is truly constructed in a bottom-up manner? More precisely, is it possible to commit to a partition of the points into a hierarchical clustering in clusters of radius < d, (that is, commit to the bottom levels of the HST), without looking at the distances that are greater than, say, 16d?

Here is a possible answer: put v and w in the same cluster of radius d iff their sequences of balls have the same balls for radius d, 2d, 4d, 8d and 16d. From that point on, in later, coarser clustering, v and w are committed to staying together. What happens if the FRT algorithm wanted to separate them? Perhaps there is a high priority vertex x such that B(x,32d) contains v but not w. Well - tough. If v is the highest priority vertex in the cluster, then w yields to v and puts B(x,32d) in its sequence, even though technically it wasn't supposed to do it.

How does that affect the HST properties? It still creates a hierarchically well-separated tree, and the distance in the tree metric is still not much less than the distance in the input metric. But is it still upper bounded by O(log n) times the distance in the input metric, in expectation?

Sunday, November 6, 2011

Students sleeping in class

As the semester goes on, good resolutions fade. Among the students who still make the effort to come to class, I see some who desperately try to stay awake, but their eyes close in spite of themselves. (That happens to me too, especially if I listen to a talk right after lunch.) But what I teach is sometimes demanding. It requires an effort of the intellect.

When students do sports, at practice they have to put in the effort, day in and day out, even if sometimes it feels tiresome. But in class, even if they diligently put in the required effort to come to class, their brain might sometimes be reluctant to engage. They could just be too tired.

To awaken the brain from its slumber, the instructor can try to entertain. What if we had drills, like in sports? What if I asked questions and the students had to push some button to answer? Would that wake them up?

When I was in elementary school the teacher started the morning with operations. He or she would say: "3 plus 1" or "6 times 9", we would each write the answer on our small individual chalkboard, and raise the board high above our heads, for the teacher to see all the answers simultaneously. I always thought the purpose of that exercise was to memorize elementary arithmetic operations, but maybe there was something more to it: it woke up our brain, like drills waking up our body in sports. After a few minutes of such exercises, we, perhaps, felt more alert, more ready to listen to and learn new material.

Friday, November 4, 2011

On applying for jobs

Here is how I applied for jobs the first couple of times: I put together my vita, rushed to finish papers as soon as possible, tried to come up with the best research proposal I could think of, and sent copies of the whole application to universities and colleges, hoping some of them would show interest. That was unsuccessful.

Then Lynne Butler looked at my application materials and pointed out:
- that I ought to get the name of the university right. Sometimes I addressed my application to, say, "Williams University" instead of "Williams College". I answered that it was a detail, but she said that such details show indifference on my part;
- that my cover letter, identical for all places where I applied, would be a turnoff and that in many universities they would not even turn the page and look beyond it.
That year, in the early 1990s, the job market in TCS was very tight. In Math Lynne told me that it was even worse, with 900 applications received for one temporary 1-year instructor position!

My discussion with her at that time shifted my perspective entirely.

Here is how I apply for jobs now: I look at the web page of the place I am considering. I try to picture myself being there. What would I like about it? Could I imagine being there in the long term? How could my being there help develop my research or my teaching? How could that place make a difference for me, and how could I make a difference for that place? What would be its most attractive features for me? After I've spent some time making up a story of my hypothetical future there, after I have, perhaps, visited, or at least explored connections, once the picture is a bit more clear, I can easily write an interesting cover letter, inserting a couple of personal specific sentences.

Thursday, November 3, 2011

Walking in a city: the shortest expected path

Say that you are in a US city with streets arranged in a grid, and your destination is i blocks to the North and j blocks to the East of your current location: which path will you pick to go there? There are (i+j choose i) different routes that all have the same length, but you may or may not have to wait at intersections before crossing. The shortest path is, among the paths with minimum total length, the one that minimizes waiting time.

Whenever you reach an intersection, you are able to cross without waiting, in at least one of the two directions North and East, as long as i and j are both positive. But if i is zero, there is only one shortest path and you are stuck just going East for j blocks, forced to wait at intersections whenever the light is red for pedestrians. So, the best way is to avoid reaching the point where i or j is zero.

This means that, if you have just finished crossing at one intersection, finding yourself at the North-East corner, and now have a choice between walking East for one block or North for one block, you should always choose the direction that decreases max(i,j).

Say that you are at the North-West corner of an intersection, and that the light is red for pedestrians to go East. Should you wait for the light to change so that you can cross to the East side, or walk North for one block, hoping to be luckier at the next intersection? It depends on whether you got there by walking East for one block, or by walking North for one block and crossing the intersection from South to North: in the first case your waiting time is completely random and there is no point in waiting at this spot rather than walking North for one block and hoping to get lucky at the next intersection. In the second case, the answer is not completely clear. Since it has already been a while since the light last changed, on average you will wait less at the current intersection that at the next intersection. However you should also take into account how close min(i,j) is to zero, or perhaps how small |i-j| is: maybe you are in no hurry to go East, because i is large compared to j and you will have many other opportunities to cross going East.

I realized at my last trip to the Boston financial district that I use that algorithm without even thinking about it, relying on some gut feeling for the last case, and automatically applying the above rules for the other cases. I wonder: does everybody do that? Or at least, every computer scientist? Do people do it consciously or deliberately? Is this actually the best path, or is there a better way?

Wednesday, November 2, 2011

Lynne Butler on academic life

Lynne Butler is a Math professor at Haverford College and a long-time friend. Here is what she has to say about the academic life.

Claire: How would your career have been different if you had been a man instead of a woman?
Lynne: My field would have been algebraic topology. As a graduate student at MIT, that's what I wanted to do. My advisor was skeptical of the role of women in Mathematics, and thought advising them was a wasted investment because they would not be truly committed to their career. I agreed with him that marriage for a female mathematician was not a good idea, and promised to prove my commitment to Mathematics, and he finally agreed to give it a try. However after some time the situation deteriorated. Once at a meeting I saw a stack of identical papers on his desk, asked if they were copies of his latest work and if I could borrow one to read it; as a result, he complained to the chair that I was intruding on his private space. Things got gradually worse, until one day he told me that my lack of command of a paper he had given me to read confirmed his bad opinion of female mathematicians. The department chair then advised me to find an advisor in a different field, and that's how I ended up getting a PhD in combinatorics with Richard Stanley and - sweet revenge - got a job in Princeton (whereas algebraic topologists had and still have the greatest trouble finding jobs in top places), which later led to my going to Haverford, in a job that I love.

Claire: You were associate provost for a year. Is there anything interesting about that job?
Lynne: yes, although there were tasks that I hated and did very poorly, there were also some things that I really liked and was able to do really well. In particular, negotiating hiring of visiting professors: from the time when a department recommends recruiting a new faculty, to the time when they start, there are many questions that need to be resolved to decide whether the candidate is a good match with Haverford and how to make it work. For example, for faculty with interests that span across several departments, arrangements have to be made, and I loved doing that.

Claire: Do you do a lot of mentoring students?
Lynne: Yes, it's a big part of my job. Not just when they major, but when they start, when they choose their major, when they graduate, and after they graduate. I have continued advising former students through their time in graduate school, when they apply for jobs, and even advised them about the tenure process.

Claire: I remember applying for jobs from France. I was clueless, you took a look at my failed applications and explained to me how I was going about it the wrong way, I followed your advice, and it worked.

Tuesday, November 1, 2011

Fukushima 8 months later

News from NHK, Japan's government broadcaster:

“Xenon has been detected in the Containment Vessel of Reactor 2 at Fukushima I Nuclear Power Plant. Xenon is created when there is nuclear fission.” [...]

“The analysis done on November 1 found xenon-133 and xenon-135, which are created when uranium-235 undergoes nuclear fission.”

“Xenon-133′s half life is 5 days. TEPCO says it could not rule out the possibility of nuclear chain reaction happening again, so the company poured water with boric acid into the reactor to suppress the nuclear chain reaction for one hour starting 3AM [on November 2].” [...]

“[TEPCO says that] if the nuclear chain reaction is happening, it is small-scale.”

Reminder: news, rumors, scientific analysis and fear-mongering on Fukushima can be found at and at while raw measurements are at (and the other graphs reachable from there by clicking on the links)

In the present case, the first US media to report this, amazingly, is Fox business news.

Path to Professorship at MIT: with or without alcohol?

Recently I sat on a panel at the "Path to Professorship" event at MIT, for graduate students and post-docs. After the afternoon panels, it was announced that there would be a reception with wine. However, during the reception, a photographer who came to take a picture of the group with whom I was talking asked the woman holding a glass of wine to put it down while he took the picture: apparently, it's ok to drink wine, but not to post pictures of people drinking wine!