MARGIN OF ERROR….Who’s ahead in the presidential race? Here’s a typical report from early August:
An opinion poll released yesterday found Mr. Kerry had the support of 49 per cent of voters, compared with 47 per cent for Mr. Bush, a statistical tie….
The Globe and Mail reported this as a “statistical tie” because Kerry’s 2% lead is within the poll’s margin of error (MOE) of 3%. This in turn is based on the theory that (a) statistical results are credible only if they are at least 95% certain to be accurate, and (b) any lead less than the MOE is less than 95% certain.
There are two problems with this: first, 95% is not some kind of magic cutoff point, and second, the idea that the MOE represents 95% certainty is wrong anyway. A poll’s MOE does represent a 95% confidence interval for each individual’s percentage, but it doesn’t represent a 95% confidence for the difference between the two, and that’s what we’re really interested in.
In fact, what we’re really interested in is the probability that the difference is greater than zero ? in other words, that one candidate is genuinely ahead of the other. But this probability isn’t a cutoff, it’s a continuum: the bigger the lead, the more likely that someone is ahead and that the result isn’t just a polling fluke. So instead of lazily reporting any result within the MOE as a “tie,” which is statistically wrong anyway, it would be more informative to just go ahead and tell us how probable it is that a candidate is really ahead. As a service to humanity, here’s a table that tells you:
So in the poll quoted above, how probable is it that Kerry is really ahead? The MOE of the poll is 3%, so go to the top row. Kerry’s lead is 2%, which means there’s a 75% probability that he’s genuinely ahead of Bush (i.e., that his lead in the poll isn’t just due to sampling error).
Generally speaking, national polls use sample sizes of about 1,100, which translates to an MOE of 3%. State polls often use a sample of 600, which produces an MOE of 4%. Subsets of polls sometimes have MOEs of 5% or higher.
Now, there are plenty of reasons other than sampling error to take polls with a grain of salt: they’re just snapshots in time, the results are often sensitive to question wording or question ordering, it’s increasingly hard to get representative samples these days, etc. etc. But from a pure statistical standpoint, a lead is a lead and it’s always better to be ahead than behind.
So: how about if the media gets itself out of the mythical “statistical tie” business and just reports the actual probabilities instead? The table above does all the heavy lifting, and all it takes is a 5-line Excel spreadsheet if you want more precision. Simple.
ACKNOWLEDGMENTS: Thanks to Nancy Carter and Neil Schwertman, Professors of Mathematics and Statistics at California State University, Chico, for providing me with the formulas used to generate the table and the spreadsheet.