Protein interactions follow signaling pathways whose cascading effects cause some diseases. Past experiments have harvested much information, not always very reliable, on which pairs of proteins interact with each other. From that information, how can one infer entire subgraphs that, together, determine the function?
Define a graph where the vertices are proteins and the edges are interactions between proteins.
Give a cost to each edge, depending on how confident you are that the experimental data shows interaction -- cost zero if you know with certainty that there is interaction. Give a weight to each vertex, depending on the differential expression of that protein -- high weight if the protein is significant, weight zero if you are not interested in that protein.
Then the problem can be modeled as a prize-collecting Steiner tree problem. The objective is to find a min cost tree connecting the vertices of positive weight, where vertices can be omitted by paying a penalty equal to their weight:
Min (cost (tree) + weight(vertices not in tree)).
Since there is no natural absolute scale common to edge costs and to vertex weights, it makes sense to use a scaling parameter lambda and find:
Min (cost (tree) + lambda*weight(vertices not in tree)).
Then the human user can look at the outputs produced for a range of values of lambda (as lambda grows, fewer and fewer vertices are connected and the tree becomes smaller and smaller) and eyeball the most interesting one.
In a paper I looked at, the authors do just that. They ran a prize-collecting Steiner tree algorithm on a yeast protein network. They were able to find pathways that had already previously been found by other means. The Steiner tree, predictably, contains vertices of high weight ("proteins differentially expressed"), but also Steiner vertices, possibly of low weight but serving to bridge subparts of the tree. One of the Steiner vertices they found, COS8, was "a gene of unknown function" that people had not previously paid much attention to.
They then proceeded to run biological experiments in the lab to test COS8, and, lo and behold! They found that keeping or deleting COS8 from the strains had a critical effect.
What gets much of the credit for this success story? Algorithmic design. The reason why the authors were able to run their program on that data in such an effective manner was that their algorithm was faster and better than previous methods. In fact, this is an ideal setting for algorithmics: one does not need an exact answer, since a good enough approximation algorithm suffices; and the problem is a well-studied type of variant of a classic combinatorial optimization problem.
The questions that the interested reader will ask, I am sure, is: what was the algorithmic technique used to get this breakthrough? And who are the researchers, in the approximation algorithms community, who did this masterful piece of algorithmic design?
The answer will wait for another post.