If you spent a tenth the time I did staring at and trying to understand polls during the 2012 election cycle, then you owe it to yourself to read Steven Shepard’s National Journal piece on the post-election debate over polls and poll aggregation.
The big data point everyone’s trying to digest is that so many pollsters–particularly Big Dogs like Gallup–missed or underestimated Barack Obama’s eventual four-point popular vote margin. And that in turn has the polling industry (notably those–again like Gallup–who conduct expensive live interviews) a-fearing that their media consumers will increasingly turn to aggregators who simply average and at most massage polling data generated by others.
Shephard offers a useful brief history of the aggregators:
Real Clear Politics began the practice of averaging polls before the 2002 midterm elections. RCP was joined by Pollster.com–which is now part of The Huffington Post–four years later. “Pollster started in 2006, and we were really building on what Real Clear Politics did,” founding Coeditor Mark Blumenthal said. The statistician Nate Silver began a similar practice in 2008, and his site, FiveThirtyEight, was acquired by The New York Times shortly thereafter. More recently, the left-leaning website Talking Points Memo started its PollTracker website before the 2012 election.
Each of these organizations differ in their approaches. Real Clear Politics does a more straightforward averaging of the most recent polls. TPM’s PollTracker is an aggregation involving regression analysis that uses the most recent polls to project a trajectory for the race. FiveThirtyEight and HuffPost Pollster use polls, adjusting them for house effects–the degree to which a survey house’s polls lean consistently in one direction or another. FiveThirtyEight also uses non-survey data to project the election results.
All four of these outlets underestimated Obama’s margin of victory. Both Real Clear Politics and PollTracker had Obama ahead by only 0.7 percentage points in their final measurements. HuffPost Pollster had Obama leading by 1.5 points, while FiveThirtyEight was closest, showing Obama 2.5 points ahead of Romney in the last estimate. The aggregators that came closest to Obama’s overall winning margin were the ones that attempted to account for pollsters’ house effects.
Aside from “house effects,” aggregators are also vulnerable to distortions created by pollsters who are especially active in any particular election:
Part of that problem, at least when it comes to the national presidential race, were the daily tracking polls from Gallup and automated pollster Rasmussen Reports. Both firms reported results that were biased in favor of Romney this cycle, but by publishing a new result every day, their polls could be overrepresented in the averages. “The one sort of Achilles’ heel of the regression trend line that we’ve done classically on our charts, there are two pollsters that contribute most of the data points,” said Pollster’s Blumenthal. “Not only does that make the overall aggregate off, it can also create apparent turns in the trend line that are [because] we’ve had nothing but Gallup and Rasmussen polls for the last 10 days.”
Gallup and Rasmussen, of course, were the two big polling outfits whose final surveys projected a Romney win. But it’s not as though you can necessarily just pick a “good” pollster and stick with its results: one of those which seemed to nail the final results, Pew Research, was all over the place late in the campaign, producing real panic among Democrats with an October poll showing Romney up by four points. So for the time being, the aggregators would seem to be the most reliable barometers of where a given contest is and may be headed.
Now some people will read this or read Shepard’s piece and say: “Screw the polls.” That’s an understandable attitude, but not particularly helpful; candidates and elected officials are going to look at polling data even if you and I don’t, so we might as well keep up. More generally, the answer to bad or questionable data is more and better data, not less. So as the next election cycle takes shape, perhaps we will see a gradual evolution towards better understanding of polls so voters are less inclined to be fooled by idiot newsreaders or hackish spinners touting a particular survey as definitive.