Political Animal

POLL NUMBERS….MyDD has a bunch

POLL NUMBERS….MyDD has a bunch of charts up today showing Bush’s historical approval ratings, all of which have been declining steadily since 9/11. This isn’t really news ? they’ve been declining steadily, after all ? but just in case you ever doubted that these numbers are driven more by outside events than by actual performance, consider this: after 9/11, Bush’s approval rating for handling the economy jumped from 54% to 72%.

This obviously has nothing to do with either the economy or with Bush’s handling of it, so take these polls with a big grain of salt. But there is at least one thing to take away from this: perceptions matter. Bush is vulnerable in 2004, but convincing people of that depends on both good luck and good marketing by the Democrats, not just on substantive policy.


MODERATE REPUBLICANS FIGHTING BACK?….An interesting article in The Hill this week says that centrist Republicans in the House are upset with the efforts of new Majority Leader Tom DeLay to enforce a conservative party line:

Many centrists are angered by a $50,000 contribution that Majority Leader Tom DeLay (Texas) made to the Club For Growth, a conservative advocacy group whose mission they say is to defeat liberal Republicans in primaries. Rep. Wayne Gilchrest (R-Md.), a member of the Tuesday Group, faced a stiff challenge from a conservative primary challenger backed by the group.

….It has also become apparent that House leaders have followed through on threats made last year in the midst of a heated battle on campaign finance reform.

Members of the GOP leadership withheld plum committee assignments from Republican lawmakers who defied them and signed a discharge petition forcing a vote on the controversial bill.

If there’s anyone who seems likely to try and overreach in order to push a conservative agenda, DeLay’s the guy. Considering the lukewarm response that Bush’s tax plan has gotten from moderate Republicans, it will be interesting to see if this turns into a real fight or is just a tempest in a teapot.

DIAPERS AND BEER….John Quiggin has

DIAPERS AND BEER….John Quiggin has a post today about a subject that relates to marketing, economics, and statistical analysis, all of which are favorites subjects of mine. What’s more, the context is one of my all-time favorite quandries, so unless the intersection of these three topics strikes you as only slightly less tedious than filling out a 1040 form, read on. And yes, if you make it to the end I do have a point to make, one that perhaps John will respond to.

Here’s the background: one of the things that statisticians do is to try and find correlations. A famous example from marketing, for example, is that people who buy diapers also tend to buy beer. One of the problems with correlation hunting, however, is that they are mostly based on surveying a small number of people and hoping that they represent the entire population. Unfortunately, every once in a while you’ll get a correlation by chance ? your sample just happened to include a lot of alcoholics, for example.

Here’s where the fun stuff starts. We marketing folks just adore analyzing what people buy (loyalty programs at supermarkets combined with computerized bar code readers make this pretty easy), and we use this analysis to, um, better serve your needs. Basically, we take enormous masses of data and sift through it until we come up with some correlations. Bingo! People who buy diapers also buy beer! So let’s put a beer display on the diaper aisle.

As John points out, however, there’s a problem. If you take huge masses of data, you’re bound to find some correlations just by chance, so the whole enterprise seems like it’s built on straw. By the normal standards of statistical analysis, you’ll find correlations 5% of the time even in random data, so if you look at an enormous data set with a thousand different pairs of data you’ll find about 50 strong correlations just by chance. So what’s the point?

Well, first off there are some pretty sophisticated statistical tricks you can do with the data to make it more reliable. And, as John will no doubt be jealous to hear, he’s right: we marketing folks have pretty sizable budgets and can afford to run multiple surveys (or buy new data sets) if we find something that looks interesting.

But even aside from this, there’s a more fundamental question at hand, and it’s the point of this whole essay: is a correlation deduced from a huge multivariate analysis really less reliable than one deduced from a focused study? The argument seems to me to be this: if you have a hypothesis and test it, and you find a correlation, that’s good. But if you don’t have a hypothesis, and you find a correlation, then it’s probably just by chance.

But it’s not. The numbers don’t care whether you have a hypothesis or not, and in both cases there’s a 5% chance that the correlation is due to chance. In both cases you will have to reproduce the results independently if you want to increase your certainty.

Is this a trivial point? I don’t think so, because I think it points to a serious flaw in a lot of statistical analyses: the feeling that if you test a specific hypothesis and find a strong correlation, it’s probably real. Oh sure, you will make the usual disclaimers about 95% confidence intervals, but the reality is that the results get treated seriously.

I’m not sure they should be. Or rather, I’m not sure they should be treated any differently than the data mining techniques that produce masses of correlations. I suspect that the disillusionment among economists (and others) with data mining is real, but mostly because it punches you in the nose with the fact that correlations are often just artifacts of chance. The same is true of focused studies, but because these correlations back up a claim we wish to make, we mentally discount the possibility of random error.

This is wrong. Numbers are numbers, and no matter where they come from they should be treated with the same respect ? or lack thereof. To suggest otherwise, I think, is merely to admit that your conclusions are based not just on the numbers themselves, but also on some previous belief ? a Bayesian argument that we will leave for another day.

POSTSCRIPT: In case you’ve ever wondered, data mining is the real reason behind supermarket loyalty programs. Oh, loyalty is part of the reason too, but the real payoff is that (a) it produces mountains of data that supermarkets can use to sell their products more efficiently, and (b) there are many eager buyers for the huge, real-time data sets that supermarket loyalty programs produce. But don’t think about this too much. It will just scare you.