Problems with Polling

PROBLEMS WITH POLLING….ARG has finished their massive nationwide poll of 600 people in each state (plus DC), a total of 30,600 respondents. Here are the basic results:

  • Nationwide, Bush leads Kerry 47% to 46%.

  • Kerry has the lead in 20 states with 270 electoral votes.

  • Bush has the lead in 29 states with 253 electoral votes.

  • Two states are tied (Wisconsin and West Virginia).

In a poll this large, there’s essentially no margin of error in the national number, which leads Robert Waldman to wonder why other pollsters don’t also use larger samples to eliminate (almost all) sampling error:

I think pollsters use small samples only partly to save money, and also to give themselves an excuse if their numbers are off. With a huge sample, a difference between the poll and the election would imply a more worrisome problem, either a biased sample, a faulty likely voter filter or a psychological difference between talking to a pollster and actually voting. It is clear that some or all sampling techniques give biased samples, because the spread of polls is too large to explain with sampling error alone. Polling agencies certainly don’t want to spend money to prove that they are one of the agencies with a defective sampling technique.

He may be right. Sampling error is real, but it’s not what’s at fault for the huge disparities we’re seeing lately, with polls taken on the same day sometimes varying by as much as 10 points or more. The real problem is the weighting formulas used by the different polling firms.

And as near as I can tell, it’s only going to get worse. I’ve been reading for years that truly random telephone polling is getting harder and harder for a variety of reasons: cell phone proliferation, caller ID, fewer people willing to talk to pollsters, etc. This makes raw calling samples more and more distorted and puts an increasing burden on weighting models that correct the sample to more accurately reflect the actual electorate.

And that’s not all. Add to this various formulas for deciding who’s a likely voter and who’s not, and what gets reported in the daily paper is becoming more algorithm than it is real data. What’s more, calling more people won’t help. If there’s a systematic bias in the sample, it’s going to be there regardless of the sample size.

What we’re seeing this year may be the Cheynes-Stokes breathing of traditional polling models, and by 2008 the whole enterprise may either be dead or changed beyond recognition. In the meantime, though, we have the worst of all worlds: we’re still relying on traditional polls even though the sample distortion is too large to be massaged away with fancy software, but we don’t have new polling models to replace them yet.

In other words, we don’t really know who’s winning. Election day may turn out to be a real surprise.