Teacher: Consider the following situation. Driving to work takes you about 20 minutes on average. Is it possible that, with probability 90%, the commute takes you more than 4 hours?
Note: Repeat the question twice, once with "90%" and the other with "9/10th" so that each student hears the wording with which he or she is most comfortable. Possibly add specifics about rush hour, downtown, etc, if you need a story to get students going.
The students offer various reasons, but the teacher does not elaborate on them (to save time).
Teacher: Consider another example. You are investing $1000 in some stocks. Your expected return after one year is 5%. Is it possible that, with probability 90%, after one year you will be a millionaire?
Again the teacher does not elaborate (to save time).
Teacher: Now, the question that is the real topic of this lecture. Consider a random non-negative variable X, and way that it's 5 on average. Is it possible that, with probability 90%, X is larger than one million?
- take numbers that are so large that the answer is intuitively obvious. It is that intuitive understanding that will provide students with the wedge to understand the meaning of Markov's inequality.
- stress the question that "is the real topic", so that students who are bored because it's too easy for them so far pay attention. Write it on the board just in case.
- why not start directly with the general question? Con of starting directly: at this stage of the course, most students are not yet comfortable with the abstract notion of "random variable", and the examples serve to ground it in reality, reinforcing the teaching of some previous lecture. Con of starting with an example: students might get distracted by the specifics of the example and find reasons that are irrelevant (bus schedule, say), wasting lecture time. Pro of starting directly: I give them only the relevant information, so they're not going to start talking about the distribution r other externals. The best of both world: by starting with, not one, but two examples, stir students away from irrelevance. They should be as similar as possible regarding the numbers used, but as different as possible regarding the context and distribution. They start seeing what properties the two examples have in common, and that directs their thoughts in the right direction. At the same time, that has the side advantage of teaching them a little bit about problem-solving.
Teacher: Why not?
Note: here, do not intervene and let a free discussion between students shape the answer. First, because this does not require particular skills. If they parsed the question, then the "no" answer should be obvious, and you should quickly hear an answer of the form "Because if X was one million almost all the time, then it would have to be more than 5 on average". "Why?" You ask. Eventually someone says something like: "If X is one million 90% of the time, then on average it's at least 900000".
Teacher: Let's formalize that. If X is one million 9/10th of the time, then what can we say about its mean, call it m?
Students: m is at least (9/10)*1000000
Note: when stating the question, make sure to say "9/10th" rather than "90%", to provide a subconscious hint.
Note: repeat the question several times, switching the terms "mean", "average", and "expectation", to reinforce the recent acquisition of that vocabulary by the students, and so that each hears the term which he or she is most comfortable with.
Teacher: What if X is, not exactly one million, but at least one million?
Students: [after thinking about it for a moment] It doesn't make a difference.
Teacher: Let me give you a counterexample. Say that X is, with probability 90%, one million, and will probability 10%, minus one million. [Write those numbers.] Then [write the one-line calculation] X is 800000 on average, so it's not true that, as you claimed, m is at least 900000.
Students: [after thinking about it for a moment] But X has to be non-negative.
Note: place this question right before the proof, as a reminder of that assumption, so that when they have to use it in the proof, it's fresh in their minds.
Teacher: Now, let's extend the problem. If X is one million (or more than one million) with probability p, then what can we say about m?
Students: m is at least 1000000p.
Teacher: Let's extend it further. If X is at least some threshold value v with probability p, then what can we say about m?
Students: m is at least vp.
Note: introduce variable names one at a time. Most people have trouble with abstraction for new concepts, and that makes it easier for them to not get lost by a sudden jump from concrete numbers to abstract algebraic equations. It helps preserve the link between reality and abstraction.
Teacher: Can you prove it? [Then, if no suggestion comes forth:] Remember the definition of expectation. Let's say, to simplify, that X is an integer random variable, so it only takes on integer values.
Students: E(X) is the sum over i of i times the probability that X equals i.
Teacher: actually, in the case it's more convenient to use another characterization of expectation of integer random variables that we have seen before.
Students (remembering): E(X) is the sum over i of the probability that X is greater than or equal to i.
(after some period of fumbling) Aha! So, we can just look at the terms for i=1,2,...,1000000 (v). Each term is at least p, because, if X is at least one million, then X is at least i. So we get p+p+...p, v times, so it's pv.
Teacher: Great. Where did you use the fact that X was non-negative?
Students: (after a blank moment) oh, right; it's necessary for our characterization of expectation.
Note: serves as a quick reminder to reinforce the assumptions behind that equality.
Teacher: You have proved: m>= vp. That's essentially Markov's inequality, but it's usually used, as in the examples above, to prove for variables where m is known, p is unknown, and we use this to prove bounds on p. Can you restate it in that form?
Students: [hesitate, then someone says:] p <= m/v.
End of interactive part, the teacher takes over and writes the final statement of Markov's inequality and its proof, writing it all on the board while the students take notes.
Extra time: Apply the final statement to the examples from the beginning of class. Give the proof for continuous random variables as well (using integrals), pointing out similarities.
Note: Be ready for a different proof. For example a student might say: "The minimum m could be, given our assumptions, would be if X was 1000000 exactly 90% of the time and was 0 the other 10% of the time, and in that case m would be exactly equal to 900000", and then the proof developped in lecture should be built on that insight.
Note: This takes a lot longer than just giving the statement and proof. But it makes the lecture much more interesting for the teacher because we never know what's going to happen, and for the students because they are developing their own insights --- they can have their "Aha" moments. It's also a great way to gauge their level of understanding (or at least, the level of understanding of those who participate.) Hopefully it will help them better remember Markov's inequality. Recapping in the end by writing the statement and proof in full is boring for those students who were most active in the discussion, but necessary for those who are struggling to follow. It can be done quickly, but it has to be done.