Friday, January 6, 2012

Push-pull

I just saw a talk by Thomas Sauerwald analyzing the push-pull protocol for broadcasting in graph-theoretical models for social networks. If you ask your search engine for push-pull, many answers come up, related to aviation, physics, flirting, and marketing. But this is different.

To broadcast information, in the push protocol, at every round every node that has the information rudely shares it with a random neighbor, whether that neighbor wants it or not. In the pull protocol, at every round every node that does not yet have the information extracts it from a random neighbor, if that neighbor has the information. In the push-pull protocol, both behaviors occur.

Q: How many rounds before 99% of the nodes are informed?
A: log log n if the network is a Chung-Lu random graph model for social networks (parameterized such that the average distance between two nodes is log log n.) That's the result.

Q: What's the Chung-Lu model?
A: each node i has a weight w(i), and edge {i,j} is in the network independently with probability min(1,w(i)w(j)/W), where W is the sum of all weights. w(i) is such that the distribution of degrees follows a power law (with parameter beta).

Q: How about informing not just 99% but 100% of the nodes?
A: the diameter is roughly logarithmic, so, there is no way that you can inform 100% of the nodes in time log log n. In a social network, when something goes viral, 1% of the crowd will remain clueless in spite of your efforts. The solution: just forget abut them. We're the 99%!

Q: What's the rough idea of the proof?
A: Partition nodes into classes according to (the log log of) their degree, (rounded to the nearest integer.)
0. The information initially belongs to a random node.
1. After log log n rounds it reaches a node in the highest class.
2. After a few more rounds it reaches (50% of) the nodes in the highest class.
3. After another log log n rounds it goes from there to almost all network nodes.

Step 2 contains some intuition: In the Chung-Lu model, two nodes x and y of high class (i.e. high degree) are probably connected by a path of length 2 via an intermediate node z of degree 2. If x has the information, then z will pull it from x after a couple of steps, and then z will pass it on to y after a couple more steps, so nodes of degree 2 transmit information in a constant number of rounds. In other words, small degree nodes are good for passing on information quickly to everyone they know. So what matters in the large degree nodes is really their degree-2 neighbors: those will immediately pull the information and share it across to their other neighbor, thus roughly reducing the GOSSIP mode of communication to a LOCAL mode of communication.

Q: Is the push-pull protocol realistic?
A: It's not inconceivable. Pushing corresponds to, say, sending an email to a single (or a few) recipient(s) to share information: "Have you heard? Abe and Amy broke up!". Pulling corresponds to asking your friend: "How are Abe and Amy doing? I haven't seen them lately" or checking their Facebook page, and then, if you find the piece of information which you were waiting for "Status: single", adding a note about it on your own Facebook wall: "Hey everyone, look, Abe is no longer in a relationship. He's single! How exciting!". (I am having trouble imagining why someone seemingly so interested in that piece of information would want to broadcast it to the rest of the world, but if that's the only reason why the protocol is not perfectly realistic, I'll give it a pass.)

Q: Is the Chung-Lu model realistic?
A: Maybe since it's a reasonably good fit with reality according to several statistics. But there is no way to know for sure, is there?

Q: Is the above log log n result robust? Does it still hold if the push-pull protocol or the Chung-Lu network is replaced by something similar but different?
A: The proof presumably breaks down. The result might still be true if the graph has small average distance and if it has some small degree nodes adjacent to large degree nodes.

Q: Is the synchronous assumption realistic?
A: In the asynchronous model in which each node has a Poisson clock, the log log n becomes a 1. That's because x and y don't just have a path of length 2 going through node z: they have many such paths going through z1, z2, …, and one of them goes through a node whose Poisson clock rings twice in rapid succession: "Zap! Get that info from x! Zap! Tell y about it, quick, to be the first one!" and transmits information almost instantaneously. So we don't mind if synchronicity does not hold, quite the opposite!