May the Source Be With You

The cost and ease of sharing information in biological research is plummeting, thanks to computers and the Internet. Sending files over jerry-rigged 14K modems the way Brand did was convenient, but nothing compared to streaming swaths of genetic data worldwide in seconds. But at the same time, the potential value of new information in biology has been skyrocketing. Discovering a critical gene controlling, say, Alzheimer’s or impotence, can be worth a fortune. That’s why Wall Street has poured hundreds of billions of dollars into pharmaceutical and biotechnology firms in the past 20 years. And it’s why universities such as MIT and the University of California at Berkeley increasingly have become partial subsidiaries of corporations, licensing discoveries in return for pieces of the action.

The whole corporatized system, however, rests on the ability to hoard information. The information and its dissemination has to be owned through government-granted patents and licenses, if the discoverer is to make big money on it. In one way, that’s fine. The prospect of profits inspires research and our increasingly corporatized system has produced some notable medical breakthroughs and innovations–drugs to treat high cholesterol and depression, for example. Perhaps most famously, it was a private company hunting for gold, Celera, which figured out a new way to decode genetic data and spurred the mad race to mapping the human genome.

But hoarding information clashes directly with another imperative of scientific progress: that information be shared as quickly and widely as possible to maximize the chance that other scientists can see it, improve on it, or use it in ways the original discoverer didn’t foresee. “The right to search for truth implies also a duty; one must not conceal any part of what one has recognized to be true,” reads the Albert Einstein quote inscribed on a memorial outside the National Academy of Sciences offices in Washington.

The great physicist, then, might be disappointed if he learned that in 2002 he’d need approval from 34 different patent holders before buying a new kind of rice genetically engineered in Costa Rica to resist a tropical virus. Or that, according to a recent Journal of the American Medical Association survey, three times as many academic geneticists believe that sharing has decreased in their field over the past decade as believe it has increased–despite the ease with which one can now transfer information online. Indeed, nearly three-fourths of the geneticists surveyed said that a lack of sharing had slowed progress in the their field. Info-hoarding may help explain at least part of the decline in pharmaceutical innovation. According to a recent study by the nonprofit National Institute for Health Care Management, a rapidly increasing percentage of new drugs approved by the FDA have the same active ingredients as other drugs on the market. In other words, the industry may not be innovating as much as learning how to market and package old drugs in new ways.

Fortunately, a potentially revolutionary counter-trend is developing and helping science return to the ideal that Einstein extolled. A small but growing number of scientists, most of them funded by the National Institutes of Health, are conducting cutting-edge research into the most complex problems of biology not in highly secure labs but on the Internet, for all the world to see. Called “open-source biology,” this work is the complete antithesis of corporatized research. It’s a movement worth watching–and rooting for.

One of the most interesting innovators of this new type of biology is Alfred Gilman, who received the 1994 Nobel Prize in medicine. Four years ago, Gilman founded the Alliance for Cellular Signaling, a coalition of scientists based in Dallas striving to build a virtual cell that will allow scientists to perform experiments completely on their computers. Want to know how changing the concentration of a protein affects the cell? Or, how two specific proteins bind together–the basis for most pharmaceutical drugs? Type it in online and test for yourself. Don’t bother with test tubes or mice.

Developing a new drug typically takes about a decade and costs hundreds of millions of dollars. But Gilman’s plan could accelerate a critical stage from a few years to a few minutes. “If it works, it will make testing drugs much easier and much cheaper,” he says.

To get there, Gilman isn’t hoarding his findings, but unloading them directly into the public domain and spurning patents and copyright. He won’t rely on brilliant insights coming as he sits cross-legged in a woodshed; he’s going to organize a massive public brainstorm and rely on the collective wisdom of his many collaborators. Seven core labs will serve as central coordinators as the undertaking evolves, but hundreds of other people will pipe in over the Internet. Nearly 500 scientists worldwide have already lined up to design descriptive Web pages for molecules key to the inner workings of cells.

Gilman won’t make any money besides what he’s paid through his grants, but that’s not the point. He wants information to circulate as widely as possible. He wants to tackle a giant question, open up all his work, and let new discoveries serve as bridges rather than endings: “We couldn’t do all of this by ourselves. It’s just too big. So we have to engage the entire community.”

In many ways, Gilman is simply and deliberately copying the open-source development model popularized by the Linux operating system. Most computer programs, like most biological experiments, are protected by patents, copyrights, or simple secrecy. But after finishing the first prototype in 1991, Linux’s founder Linus Torvalds didn’t dial up a lawyer or a venture capitalist; he posted the code online and asked other people to download it, use it, and improve it. A few people did, and then emailed back an improved code that Torvalds incorporated and reposted. Today millions of loosely coordinated global users still do the same thing and, for many uses, Linux now rivals any privately designed program.

The corporate reaction to Linux likely foreshadows what will probably happen with open-source biology. Microsoft, on the one hand, has gone crazy. The company has quietly funded studies trying to debunk the effectiveness of Linux, and operating systems chief Jim Allchin has even described open-source software as un-American. However, other companies, such as IBM, have decided to take advantage of Linux and build proprietary software that they can sell on top of it.

With open-source biology, some companies clearly see the potential for the creation of useful models that they can take advantage of freely. Eli Lilly, Merck, Aventis, Johnson and Johnson, and Novartis are even helping to fund Gilman in the hope that his model will help them develop better drugs more quickly. But the corporate world also fights anything that could reduce profits or force proprietary information into the open. Publishers of scientific journals, for instance, have railed against an NIH-sponsored online database called PubMed Central which would post the full text of any scientific article ever published (see “Publisher Perish,” October 2001).

Ultimately though, many scientists do see the value of open-source biology because one never knows where lightning will strike next with data in the public domain. A sequence of genetic code from a yeast gene put into a giant NIH-sponsored database called GenBank recently led to a breakthrough in colon cancer research by another database. When a mysterious virus popped up in New Mexico a few years ago, scientists had little idea how to respond until realizing that its sequence closely matched a group of viruses endemic to Asia found in GenBank. Harold Varmus, the former director of NIH, believes that similar open collaborations will grow in biology because of the rapidly increasing scale and complexity of the discipline’s upcoming challenges: “We are all beginning to appreciate that while our own pieces of the puzzle are important, more rapid integration has to happen.”

To understand why this new type of biology is possible, it helps to follow the career paths of two scientists, Roger Brent and Larry Lok, childhood neighbors in Hattiesburg, Miss. four decades ago. Fast friends, they passed through the adolescent rites of passage for brilliant techy people together: learning calculus out of a book in the seventh grade, dropping out of high school because it bored them, beginning graduate work before turning 20. But then they split apart geographically. Brent left Hattiesburg to study biology at Harvard and began cloning genes. Lok left to study math at Columbia and learn about hyperbolic manifolds.

After earning his Ph.D., Lok headed to California to write computer code and design circuits. Brent continued his work in genetics and eventually cofounded The Molecular Sciences Institute, now a 30-person laboratory and research center in downtown Berkeley. “For a long time, I thought that the idea that a computer programmer and a biologist would work together was ludicrous,” said Lok.

For 20 years, the two friends followed their divergent paths. But in the mid ’90s, biology started to change noticeably. Biologists started dealing with giant computer databases; DNA started to resemble computer code; laboratory Internet connections became as essential as microscopes. Soon, Brent started calling Lok to talk about mathematical models of biological processes. Then, two years ago, Brent recruited his old friend back and Lok is now working to create a complete online model of yeast reproduction, much like what Gilman hopes to do with cell signaling, though on a vastly smaller scale.

Appropriately, Lok and Brent plan to put their work in the public domain. Not only has the merger between biology and computer science allowed the two friends from Hattiesburg to work together again, it’s also what makes biological open-source collaboration possible.

To write a computer program, such as Linux, a coder thinks about something he wants–say, a new word processor. He then writes some code that his computer compiles into 0s and 1s and then acts on. At every step, he can easily share and test his work. Torvalds could send someone computer code, they could test it immediately, add something they thought might help, and send it right back.

Biology followed different processes even a decade ago. If a scientist wanted to understand something, he needed a wet lab: test tubes, samples, and everything else. If he came up with a hypothesis based on preliminary work, he could write the idea up or maybe even send samples to a friend. But to reproduce or test the results, the colleagues would need all the same tools and lots of time.

Wet labs are still essential for many biological processes. But for many things now, a biologist can make a discovery primarily by examining public data and figuring out an innovative way to parse them, or, at the least, using the data to greatly amplify or check hunches that one has in a laboratory. This is what Brent and Alejandro Colman-Lerner at the Molecular Sciences Institute recently did, poring through data to prove and describe a genetic difference between mother and daughter yeast cells previously considered identical. After publishing their results, colleagues responded within hours. “Biology is [now] completely impossible without computers,” says Chris Somerville, a Stanford University biologist working to build a complete online model of the arabidopsis, a small flowering plant often studied by researchers.

This transition doesn’t just speed up the exchange of information; it also opens up opportunities for more contributors. Anyone with an Internet connection can see what Brent and Colman-Lerner did and follow up. As another example, Cornell biologist Susan McCouch works with researchers in the Ivory Coast, among other places, to identify and analyze rice genes that might have desirable traits, say, for drought resistance. She puts that information into an open database called Gramene.

Open-source still faces serious obstacles. For one, maintaining strict quality control standards can prove vexing: A project with lots of eyes working on it is bound to include a few bloodshot ones. Secondly, private companies can pay people to do all the grunt work required by giant projects; open-source projects usually can’t.

But there are ways around these obstacles. Gilman for one plans to have a core group retest all of the major findings sent to his project, slowing down the process but keeping the junk out. As for the concern that no one will work on these projects, there are thousands of adequately paid, university-sponsored scientists able to pipe in, and many of these scientists are more enthusiastic about working on open-source projects than anything locked up and corporate. Colman-Lerner recently noticed a mislabeled gene in a soon-to-be-restricted database he often uses. “I’d tell them, if they weren’t about to make money off me,” he said, noting that he corrects public database errors. It is easier to simultaneously send multiple requests to GenBank’s database than to Celera’s equivalent because Kim Worley, an enterprising researcher at Baylor University, recognized the problem and wrote code to fix it–something Worley says she wouldn’t and couldn’t have done for a private company.

In the world of Linux programmers, the so-called “Linus’ law” says that “given enough eyes, all bugs are shallow.” Someone will find the answer if enough people work on a problem, are able to communicate, and if the information really can be kept free. To be sure, the fact that information can want to be expensive has pushed science forward in places. But if Gilman’s right, the fact that information wants to be free will prove even more important in the coming years.