In 2014, a fund-raising gimmick involving dumping a bucket of ice water over somebody’s head raised $100 million toward research to conquer amyotrophic lateral sclerosis. Often known as Lou Gehrig’s disease, ALS is a devastating neurological illness that paralyzes the patient inch by inch, until the person can no longer speak, swallow, or breathe. Most people with ALS live less than two years after diagnosis.
The outpouring of generosity triggered by the ice bucket challenge was impressive. But it might have been dampened if donors had known an unpleasant fact about the research they were funding: one reason we don’t already have any effective treatment, much less a cure, is that ALS drug research has been done exceedingly poorly. In 2010, researchers at the ALS Therapy Development Institute, based in Cambridge, Massachusetts, launched a review of the animal experiments that serve as the basis for candidate drugs. Every single study had used too few mice to get a valid result. Some studies used as few as four mice. Four. Often, the researchers hadn’t had enough funding to use more mice. But regardless of the reason, when the institute conducted the same studies with enough mice, all the drugs failed to show a real effect—meaning that donors and funders, including drug companies and the federal government, had spent tens of millions of dollars on trials involving human ALS patients on the basis of spurious animal results.
This problem of “irreproducibility” of scientific results, also known as “research waste,” has begun to get some attention in both academic circles and the press, as more and more experiment results taken at face value have turned out to be false when subjected to retesting. The crisis is not so much a shortcoming of the scientific method itself, as some have suggested, but more that scientists aren’t adhering to it. Poorly designed and analyzed research is infecting the biomedical literature, causing other researchers to try to build on results that are the data equivalent of a house of cards.
Biomedical research is a $240 billion annual global endeavor. The United States is the biggest funder, with $70 billion in commercial spending and another $40 billion from nonprofits and government, mostly the National Institutes of Health. We are rightly known as the greatest research engine in the world. American labs are a wellspring of drugs and biologics that have eased suffering for millions of people.
But as Richard Harris argues in Rigor Mortis, bad science is rife in labs, both commercial and academic. Harris, a veteran National Public Radio science reporter, documents a litany of slipshod practices and biased analyses that have produced a disheartening series of scientific dead ends. Cutthroat academic competition, a headlong rush to publish in “high-impact” journals, and scarce funding all lead researchers to cut corners, and the self-correcting mechanisms of science can’t keep up.
Harris tells the story of C. Glenn Begley, a biologist who spent ten years as the head of a cancer research team at the biotech company Amgen. Begley and his team would start their research by scouring the basic science literature for promising studies, mostly by academics. (Basic science involves experiments on cellular material, whole cells, or animals, while clinical research involves human subjects.) Then they would repeat the original studies whose results suggested the most potential for new cancer drugs. Those they could validate would move on to the next phase of drug development.
After a decade, Begley decided to take stock of all the studies that had initially looked the most groundbreaking. He chose fifty-three to do over.
This time, the Amgen team asked for help from the scientists who had published the original results. They blinded the studies, which means that none of the researchers knew which was the treatment group and which was the control. (That’s a crucial way to weed out unconscious bias. If you’re testing a drug to grow hair, for instance, and you know which group is getting the drug and which the placebo, you may see hair growth among the test group even where it doesn’t exist, and discount hair growth among the placebo group.) Only six of the fifty-three re-experiments produced positive results. Begley and a colleague published the research as a commentary in the journal Nature in 2012—to resounding silence.
Harris points to other failures, many of them due to assumptions that what works for laboratory animals, mostly mice and rats, will work for people. “Lab animals are not small, furry humans,” he writes. He points to a series of experiments run by a Stanford lab that wanted to understand why, after decades of research, there has been virtually no progress in treating the serious, systemic infection known as sepsis. Sepsis, which involves an overreaction by the immune system’s inflammatory response, kills more than 200,000 Americans a year and costs hundreds of millions of dollars to treat.
The Stanford team decided to look at the genetics of the mouse strain that serves as the model for the vast majority of sepsis studies. They compared the 5,000 genes known to be involved in sepsis in humans to genes in the mouse model. Very few of the genes were turned on in the mice with sepsis, suggesting that the biology of the disease in the two species is radically different—and that decades of research on inflammation using those mouse strains had been a waste of effort.
No area of basic biomedical research appears to be immune. For many years, the National Cancer Institute promoted a line of breast cancer cells that could grow in a petri dish and were used in hundreds, if not thousands, of tests of potential breast cancer drugs. Those cell lines turned out to be melanoma, not breast cancer. More than 700 studies using a cell line called U-373 to study glioblastoma, a deadly form of brain cancer, turned out to be so full of mutations, writes Harris, that “they barely resembled glioblastoma at all.”
Nobody who understands the meandering, nonlinear path taken by scientific progress would argue that every study published should be true. As Harris puts it, the equivalent of a .300 batting average in biomedical science would be phenomenal, because much of biomedical science is about probing cellular and genetic mechanisms that are invisible. All science is iterative and looping, with one result being invalidated only to open a new line of investigation.
Serendipity plays a crucial role: breakthroughs in one arena often occur because a curious scientist was noodling around in an entirely different field. The discovery of an enzyme in an “extremophile” bacterium, which lives near undersea volcanic vents in water that can reach more than 175 degrees, led to the development of the polymerase chain reaction, or PCR, a Nobel Prize–winning technology that is now an essential instrument in genetics and diagnostics. A spinoff from research into the effect of electric fields on bacterial growth led to the development of potent anticancer compounds, such as cisplatin.
More than half of the total resources invested in biomedical research are allocated to basic research, and most ideas for studies are initiated by scientists themselves. This is a good investment—but only if the studies are conducted with rigorous attention to good scientific practice. It might sound strange, but good science is all about doing everything you can to find fault with your own theory. Or, as physicist Richard Feynman once put it, the first principle of science “is that you must not fool yourself—and you are the easiest person to fool.” Much of what ails the bad research that Harris chronicles—the poor lab techniques, incorrect application of statistics, selective reporting of data, and refusal to move away from studies that have been found to be faulty—seems to result from failing to adhere to this golden rule.
In Harris’s view, much of the irreproducibility problem has been driven by two interrelated forces: the ferocious competition for funding, and the culture of academic prestige. Researchers are judged by academic institutions, and often by their peers, on the basis of the number of papers they publish in journals with a high “impact factor.” At the top of the heap are Nature, Cell, and Science, which publish the papers that are cited the most often by other researchers—which means more eyeballs on their pages. For the journals, that translates into revenue from subscriptions and advertising. By that measure, People magazine has a much bigger impact factor than, say, the Atlantic or the Washington Monthly. And, like People, the high-impact journals often seek to publish the most fashionable, glitzy, eye-popping results, even if they are likely to turn out to be dead wrong.
Then there’s what I call the bullshit factor, the hyping of every little finding, no matter how preliminary, by academic research centers, which are ballyhooed by the press, all in the name of ginning up more money for research and more prestige for the institution—which, of course, leads to more money.
Harris’s principal remedy for the irreproducibility problem is greater scientific rigor, and many of his suggestions seem so obvious that it’s hard to believe they aren’t already standard practice: Check cell lines before beginning an experiment that depends on the cell line being correct. Use the right statistics. Use a big enough sample size to ensure that your results are not the result of chance. (Seriously. Four mice?) Harris quotes Malcolm Macleod, a stroke researcher at the University of Edinburgh, who says, “I simply don’t understand the logic that says I can take a drug to clinical trial [in human subjects] on the basis of information from 500 animals, but I’m going to need 5,000 human animals to tell me whether it will work or not. That simply doesn’t compute.”
Others have called for better peer review by journals, which are supposed to serve as gatekeepers of good science but so often simply fill their pages, sell ads, and don’t take responsibility to correct the record, even when faulty results are pointed out.
But addressing the root causes of all this shoddy science will not be so easy. Academic institutions should stop training so many PhD researchers, who are now forced to compete for a small handful of open academic and industry positions when they finish their studies. Of course, academic institutions don’t want to do that, because those grad students and postdocs are the cheap labor that keeps university labs humming. And universities need those labs to keep bringing in the grants that provide overhead fees that keep the rest of the university afloat. The few lucky postdocs who get hired may be those who have figured out how to game the publication race and produce the most papers, rather than the most creative and rigorous researchers.
Once hired, young researchers must then make a mad dash to publish as many articles as possible—in journals with as high an impact factor as possible—before they come up for tenure review. This encourages a focus on the quantity of studies and research that will guarantee short-term success, preferably in some fashionable area, even if it involves using slapdash methods.
It can feel overwhelming to think of the money being wasted and the patients who wind up participating in clinical trials that were futile from the start. Simply exhorting basic science researchers to be more rigorous hardly seems sufficient to halt the rush to get results. Until someone can find a way to alter the economic imperatives of academia and biomedical journals, we’re probably going to continue to see more research results that serve the cause of neither good science nor good medicine.