A single weapon can sometimes change the course of a war. Consider the Stinger missile, which the CIA supplied to Mujaheddin fighters in Afghanistan in 1986. The guerrillas promptly used the shoulder-launched rockets to shoot Soviet helicopters out of the sky and drive the Russians out of the country. Today in Afghanistan, if there’s a candidate for potential super-weapon, it’s a pilotless plane called the Predator, designed by the Air Force for missions too risky or unmanageable for piloted crafts. Specifically, the Air Force requires the drone to “provide around-the-clock reconnaissance support under adverse weather conditions [and] in areas where enemy defenses have not been adequately suppressed.” Recently, the Predator has been outfitted with a Hellfire air-to-ground missile, theoretically making it the perfect weapon for hunting down Taliban troops and al Qaeda hideouts without putting U.S. soldiers at risk. (The craft is controlled via joystick by a soldier who is hundreds of miles away). It was a Predator that tracked a convoy carrying Taliban leader Mullah Omar and fired the missile that leveled his compound.

Unfortunately, one of these miraculous weapons crashed. Then another. The Pentagon blamed bad weather—even though bad weather is precisely what the craft was designed to fly through. In October, the Pentagon’s central testing office reported that “the Predator cannot be operated in less than ideal weather, including rain. Furthermore, the system is unable to provide reliable, effective communications through the aircraft, as required.” The classified report, obtained by the watchdog group Project on Government Oversight, concluded that “the cumulative effect of the system’s limitations render the Predator operationally ineffective.”

The Air Force has built 60 Predators (pictured at left). Not counting the two lost in Afghanistan, about a dozen have crashed during testing, and another seven have either crashed or been shot down over Kosovo and Iraq. In early November, Secretary of Defense Donald Rumsfeld acknowledged that they couldn’t withstand an Afghan winter, although designed to do so. So Rumsfeld ordered the deployment of the Global Hawk, an even newer unmanned craft still under development, to do the job the Predator cannot.

Such performance failures are typical of many Pentagon weapons systems. But there are instructive exceptions. Consider the Navy’s new glider-bomb, the Joint Stand-Off Weapon. Like the Predator, the JSOW is a “next-generation” instrument, designed to hit the enemy while keeping our troops out of harm’s way. It can be dropped 40 miles from a target, and glide to its objective with a built-in satellite navigator. Like the Predator, the JSOW met with a few glitches during its development. Among other problems, its “derailer,” the device that separates the weapon from the jet when launched, didn’t work properly.

But that’s where the similarities end. The JSOW, derailer and all, is now fully operational. According to a report by the Pentagon’s testing office, “No JSOW weapon has experienced a failure of this component during flight test or operational use.” The glider-bomb is currently in action over Afghanistan. The Pentagon won’t release any information about its performance or that of most other weapons. (“This isn’t a game,” said a Pentagon spokesperson. “We’re not keeping score.”) But Navy pilots have fired JSOWs at anti-aircraft positions and radars in Iraq, where, according to a Pentagon testing report, “Battle damage accuracy assessment estimates exceeded requirements for the weapon.” In other words, it works.

Two weapons, both highly advanced, both with design defects. Why does one work so much better than the other? The answer to that question is vital because having the right weapons and having them work can make the difference between winning and losing a war.

Typically, when the military develops a new weapon, it performs tests along the way to see if the weapon meets its “specifications.” Does the bullet leave the gun barrel at the right speed? Does the jet’s engine provide the required amount of thrust? But only after the weapon is built (and often only when the contractor’s production lines are gearing up) does the military typically perform what are called “operational” tests—all-or-nothing exams meant to scrutinize how a weapon would fare in combat. Can a soldier under fire easily work the controls? Will the gun shoot if it’s covered in mud? The problem with this method is that by the time the tests reveal any major problems, it’s often too late to do much about it.

Unlike the other services, a few years ago the Navy changed its testing regime. It now conducts operational testing as early as possible, gradually imposing combat-like conditions, frequently while the weapon is still being developed. The JSOW works so well, says a Pentagon testing report, because the Navy “demonstrated a capability to rapidly address and resolve” glitches. The Pentagon’s testing office, meanwhile, noted that the Air Force had waited too long to test the Predator. “All the production contracts were awarded prior to operational testing,” so that “the use of the test results was diminished.” In other words, the Air Force didn’t do operational tests until it was too late to fix the problems that the tests uncovered.

The problem is endemic. According to the Pentagon’s testing office, “in recent years, 66 percent of Air Force programs had to stop operational testing due to some major system or safety shortcoming.” The Army fared even worse, as approximately 80 percent of its systems tested “failed to achieve half their operational reliability requirements.” By comparison, in the past few years, 90 percent of the Navy’s weapons passed operational testing, according to Philip E. Coyle, former chief of the Pentagon’s central testing office.

The failure of the Air Force to test early and realistically, as the Navy does, has also limited the effectiveness of its most advanced bomber, the B-2. A few weeks into the war in Afghanistan, an Air Force general proudly told a group of reporters gathered at the bombers’ base in Missouri that the planes had just completed 44-hour combat missions over Afghanistan, “the longest in combat history.” But if any of the reporters had been allowed inside the planes, they would have noticed that despite their enormous size, the planes lack a rest area, space for a backup pilot, and bathrooms. (Pilots have adapted by bringing aboard lawn chairs and using cockpit urinals.)

The planes weren’t designed to fly for two days straight, and the main reason that they had to was because of poor testing. The Air Force did extensive tests on the B-2, showing that it met all its specifications; it flew at the right speed, evaded radar, etc. But once in production, B-2s began returning from training missions with pockmarked “skin” (the stuff that makes them stealthy). Engineers discovered that rain damaged the B-2s. Testers had never thought to fly the plane through bad weather. The Air Force tried to fix the problem. But the planes were already built, and replacing the skin wasn’t feasible. So instead, the Air Force repairs the fragile skin each time the B-2 is exposed to moisture, and houses the planes in special dehumidified hangers, which only exist at the base in Missouri. “We can’t have ripples or bumps or anything like that,” says an Air Force spokesperson. “We need to make sure the skin is flawless.” That requires an enormous amount of maintenance. For each hour the plane spends aloft, the Air Force says the B-2 undergoes 45 man-hours of servicing on the ground. As a result, according to the Pentagon’s testing office, only 33 percent of B-2s are capable of flying missions at any one time, a fact that led the office to conclude that the B-2 “did not meet user requirements for sustained operations.” Indeed, in the first month of the bombing campaign in Afghanistan, B-2s flew just six missions. Navy planes, meanwhile, have flown more than 1,500.

On July 24, 1943, the U.S. submarine Tinosa was closing in on the Tonan Maru, one of Japan’s largest oil tankers. The sub got within 800 yards and launched its attack. Fifteen torpedoes later, the Japanese ship was still afloat. All but two of the torpedoes were duds, and the Tonan Maru began to pull away. The Tinosa’s captain was so furious that he broke off the attack, saving his remaining torpedo, and returned to base. Once there, he turned over his last torpedo to military testers. The scientists fired the torpedo from a barge. It seemed to work perfectly. But sub commanders throughout the Pacific were reporting problems similar to the Tinosa’s. Frustrated by the scientists’ approach, the commander of the Submarine Force Pacific Fleet, Rear Admiral Charles Lockwood, launched his own tests. Rather than firing from a barge, Lockwood sought to make his experiment as realistic as possible, so he boarded a submarine, headed out to sea, and fired the torpedo into a net. His crew hauled in the used torpedo and discovered the problem: The torpedoes were moving too low in the water, which caused the warhead to malfunction. Thanks to this sensible real-world testing, the Navy was able to recalibrate the guidance mechanisms on all U.S. torpedoes of this type, and went on to sink a lot of Japanese ships with them. But decades passed before the Navy applied the lesson of the Tinosa to its standard testing procedures.

Change came only after a monumental weapons development fiasco. In the late 1970s, the Navy attempted to design an aircraft carrier-based bomber jet to replace the aging but reliable A-6. The new plane, designated the A-12, was supposed to be a revolutionary leap forward. Packed with the latest technology, it would be stealthy, quiet, and capable of carrying an enormous bomb payload. The project was the most expensive in Navy history.

The A-12 quickly ran into problems. Nicknamed the “Flying Dorito” for its chip-like shape, it didn’t meet performance goals, a setback which caused lengthy delays and drove up costs. One study predicted that the A-12 project would be so expensive that it would swallow up about 70 percent of the Navy’s aircraft budget by the 1990s. But weapons programs have a way of developing momentum that makes them very hard to kill. Services are loath to cancel a program; Pentagon officers who’ve worked on one often come to believe, rightly, that their careers depend on it; Congress can become an ardent defender of weapons programs as well, since no congressman wants to shut down a project that employs his constituents. Recognizing this, weapons contractors routinely subcontract portions of a program to as many congressional districts as possible.

Even so, the A-12 was such a mess that then-Secretary of Defense Dick Cheney, in one of his finer moments, axed the project in 1991, the most expensive weapons program ever terminated. (Even in death the A-12 fought on—mired in litigation over contractors’ claims that Cheney lacked proper cause for cancellation. A U.S. Federal Court finally rejected those claims in September.)

But the debacle had a silver lining. “The A-12’s demise really shocked the Navy,” says Marcus Corbin, a senior analyst at the reform-minded Center for Defense Information. “They had been betting the farm on it. And it crashed. They really needed to reevaluate after that.”

After the A-12 trauma, the Navy decided to apply the lessons it had learned. “Five or six years ago, we began to work with developers from a very early stage,” says Rear Admiral Steven Baker, the Navy’s previous head of operational testing. “I told my program managers, you don’t want to be cavemen discovering fire. There should be no surprises.”

By doing operational tests earlier and more often, the Navy was simply applying a lesson which the private sector learned a long time ago. Smart companies don’t put new products on the shelves and wait until customers start complaining to find out what’s wrong with them. They start testing the product early, first by using it in-house (“eating your own dog food” as the marketing guys like to say), then in controlled settings outside the firm. Software makers, for instance, know that because their products may contain millions of lines of code, it’s virtually impossible for programmers to test on their own all the different permutations of how the software will react in the real world. So they often let users evaluate early copies, a practice called beta-testing. Beta-testing also lets software developers know how users will respond to various features. (Will they appreciate the cute computer-helper Paper

This is essentially what the Navy now does. As a result, the Navy is not only fielding more weapons that actually work, they’re fielding them quicker. One recent study examined three aircraft, one each from the Army, the Navy, and the Air Force. Though all three programs began in the mid-1980s, only the Navy’s craft is fully operational and in the field (see sidebar, page 18). Again, this should come as no surprise to anyone in the private sector. “Companies that sell to consumers release products every three years or so,” observes Jacques Gansler, Under Secretary of Defense for Acquisition and Technology from 1997 until 2000. “But defense takes 15 to 20. Why does it take so long?” A big part of the answer, he says, is testing. Flaws discovered early, while a product is being developed, can usually be fixed more quickly than those found at the last minute.

Early operational testing also saves money. A report by the Center for Naval Analyses (a federally funded think tank that is not part of the Navy) shows that Navy programs have half the cost growth shown by the other services.

Getting its testing regime right has required, and in turn inspired, a change of culture within the Navy and the contractors who work with it. According to defense experts, the Navy has the most rigorous and independent testing unit of any service. “Have you ever heard of the three Ds?” asks a Pentagon analyst who asked not to be named. “The Army is dumb, the Air Force is devious, and the Navy is defiant. They don’t bow to any pressure. And they’re dogged. People react to it and begin to expect it. So contractors anticipate tough tests … and know they’re screwed if they don’t pass them.”

Of course, the other services’ testing offices also try to be tough and independent. The difference is that their toughness and independence are constantly being undermined by forces elsewhere in the bureaucracy. For instance, under the old testing system still used in the other services, contractors and program officers have no idea until the end whether or not the weapons systems they’ve banked their careers on will make the grade. The situation inspires a gnawing fear, and the natural reaction to such fear is to exert pressure on the testing office to make the tests less rigorous. Such pressure is not unknown in the Navy. But it’s less intense because early testing gives contractors and program officers a better sense of how their weapons will ultimately perform. Indeed, many come to view the tests not as enemies to be outfoxed but as useful diagnostic tools that help them do their jobs.

Another difference in the Navy is funding. Though Congress appropriates funding for weapons programs, services typically have some discretion over how they use it. It’s not unusual for a service to rob money from one program in order to pay for others. When it does, the first places it looks to cut are testing budgets. The Navy resists this practice more than other services, which means that its testing office usually has the resources to catch and fix problems quicker. The Navy’s success also stems from its streamlined bureaucracy. The Center For Naval Analyses found that “the Army requires about 2,200 people for the job of [generating weapons specifications], compared with about 1,600 people in the Air Force and roughly 550 in the Navy for essentially the same function.” In order to make sure that they weren’t making an unfair comparison, the Center’s analysts also counted the number of people each service employed per research dollar spent. The Navy fared best. Analysts found that the Navy has one-third the bureaucrats per dollar of the Air Force and one-eighth as many as the Army.

This isn’t to suggest that the Navy has perfected the process of designing weapons. Its weapons don’t always work, and it still builds ships and other systems primarily suited to fight a Soviet fleet which no longer exists. Chuck Spinney, a Pentagon analyst who for years has pointed out flaws in the Pentagon’s weapons programs, is a notable skeptic. “There’s probably some truth to the notion that the Navy does a better job testing,” he says. “But it’s a distinction without a difference.” Spinney contends that a more important issue is the fact that the Pentagon continues to build the wrong kinds of weapons: big and enormously expensive. He singles out the Navy’s new bomber, the Super Hornet. It’s slow, not very maneuverable, and twice the cost of the plane it replaces. All that is true. But Spinney is wrong to say that superior testing doesn’t make a difference. The Navy set out to design a slow plane that would essentially function like a Mack truck, its best attribute being mostly that it can carry a lot of

stuff. The Navy may or may not have been mistaken in formulating such requirements, but the important point is that the Super Hornet meets them. It does exactly what it was designed to do, which is more than can be said for many other weapons programs.

Even the Navy’s failures suggest the importance of its new testing regime. One of the most infamous weapons snafus in recent memory is the Osprey, a plane-cum-helicopter contraption built for ferrying Marines and other special forces into and out of the kind of rough terrain which we’re now encountering in Afghanistan. (See sidebar below). The Navy helped develop the Osprey, but the craft was tested by a Marine unit (the same unit, incidentally, that maintains Marine One, the Presidential helicopter). The Osprey has crashed four times during test-flights, killing 30 Marines. It’s now grounded.

Had it gone through the Navy’s testing system, the Osprey might be available to commanders in the field right now. Instead, every one of those built sits in hangars stateside. Had the pilotless Predator aircraft gone through the Navy’s testing system, it might have been lurking in the skies over Afghanistan all winter long, making life miserable for Osama bin Laden. Instead, it too will sit in a hangar, at least so long as the weather is bad. These craft might have turned out to be like the Stinger missile: super-weapons that change the course of the war. In any event, we’ll never know, because the other services have failed to adopt the Navy-style testing system that delivers weapons that work, at lower cost, and on time. That situation ought to be changed, fast.

Eric Umansky has also written for Slate.com and The New York Times.