As COVID-19 began to spread across the United States in early March, Alexis Madrigal realized something important was missing. The nation didn’t just lack enough ventilators, PPE, and other critical medical equipment to fight the virus. It lacked enough data.

Madrigal, an Atlantic staff writer covering the pandemic, wanted a comprehensive collection of statistics on the number of cases, hospitalizations, and deaths from each state to analyze and make sense of the crisis. He quickly learned that there was no single entity taking on this task.

So, he decided to do something about it. Alongside fellow Atlantic reporter Robinson Meyer, data scientist Jeff Hammerbacher, and content strategist Erin Kissane, he created the COVID Tracking Project, a volunteer organization that collects state-by-state data on coronavirus testing and outcomes, and then works with nonprofits and local and national newsrooms to help the public better understand the outbreak and what needs to be done to address it.

I recently caught up with Madrigal to find out more about the project and what he’s discovered about America’s handling of the crisis from his unique vantage point of being the country’s leading COVID-19 data aggregator.

The following Q&A has been edited and condensed for clarity.

What’s the genesis behind the COVID-19 tracking project? What led you to create it?

I had been tracking the work of a genomic epidemiologist named Trevor Bedford. He had been able to show that the virus had been spreading in the U.S. for quite some time. Why didn’t we know that? When my reporting partner and I, Rob Meyer, started to really think about this, it became clear that the problem was that the U.S. wasn’t testing enough.

We basically started creating a spreadsheet, pulling from states’ websites, emailing them, trying to gather how many people had been tested. When we published that story, which contains a comprehensive state-by-state accounting of how many people had been tested, it was fewer than 2000 people. Seconds after the story published, a friend of mine, Jeff Hammerbacher, emailed to say, “Hey, did use my spreadsheet for this?” It turned out that we’d basically been tracking the same data. So we decided to team up. The next day, the last co-founder came on, Erin Kissane. There was no place where you could go to see how many tests have been performed in the United States, even though everyone was saying how important it was.

Do you think the federal government should be offering this service, probably through the CDC? And do you feel like you had to do this work because the government wasn’t?

We would have never done this work if the CDC had been doing it. The CDC has begun to put some information onto what they’re calling the “CDC COVID Tracker” website. The problem is that they’re not providing historical data. We need to be able to see the trends. Right now, the CDC is only providing current day snapshots. People need to know what happened before so they can predict what’s going to happen in the future. We just did a large analysis of the CDC data. Since that report came out, we’ve realized the CDC was including antibody tests in their totals, which is a huge problem. Mixing those results, as one expert told us, makes those numbers uninterpretable.

”If one outcome of this crisis is that people take into account that they’re going to need high quality data to reduce uncertainty and make better decisions, that will be a great service that we have provided.”

Honestly, it’s the kind of mistake you just cannot believe. It throws off some of the very key measures that people are using to try to decide when to allow increased social activity.

Government officials have spent so much time preparing for a pandemic. I haven’t seen a plan, though, that really thought about how data would flow. If one outcome of this crisis is that people take into account that they’re going to need high quality data to reduce uncertainty and make better decisions, that will be a great service that we have provided.

Ok, so what exactly does the Covid Tracking Project do? How has your work changed as the pandemic has progressed?

The COVID Tracking Project compiles and analyzes state level data. In practice, that means that we have dozens of volunteers who go to many different states’ websites and compile it into a spreadsheet. We have teams of people who double check that work. We have teams of people who do outreach to states, and others who try to get the quality of data reporting improved. We have a large wing that works to make this data accessible to people for different types of modeling.

At the end of the day, the COVID Tracking Project takes the state data and makes it accessible to a wide variety of people for different kinds of uses, everything from just about every major newsroom to modelers, inside and outside the government.

Is there an analysis component on your part, as well?

I think that people tend to think of data as a raw good, but in fact, it’s sort of a manufactured entity. That’s true at every level in data collection. If a state says “total tests,” does that mean that’s the number of people they’ve tested or the number of specimens they’ve tested? We are trying to help people contextualize the numbers. It’s not so much about telling people what to think about the numbers.

You have a “Data Quality Grade” listed for each state. How is that determined?

It’s really about the comprehensiveness of the state’s reporting. We have lowered states grades who were mixing antibody tests and viral tests because of what that does to their interpretability. Most of the time, it’s about what a state reports or not. We look across many different categories and see what states are reporting in these different places. And then we say, if this approach is the best way of reporting this metric for whatever reason, then maybe every state should report like that. Then reporting overall would be better.

Your website says that you’re helping local newsrooms. How are you doing that?

Local reporters are really good at flushing things out of the states. We’re good at providing the national context that allows them to understand if their state’s practices align with other states. Then, they can go to state officials and say, “Hey, what’s going on with it?”

It’s worked a lot that way too with different local publications around race and ethnicity data. We can say to them, look, 47 other states are already doing this. Then, of course, the more pressure the reporters put on the states to do better data reporting, the better our database gets. We’re able to provide almost a kind of data team for all these local newsrooms, because they couldn’t necessarily do that all on their own. They don’t have those resources.

How many volunteers do you have? What responsibilities do they cover?

Our Slack has more than 500 people in it. The active volunteer base right now is probably around 160. They have a lot of different roles. At times, we’ve covered 35 press conferences in a day, mostly to just shake free data out of those and to increase our awareness of what the states are saying. It’s a big, big project, We have scientists and epidemiologists and public health people working on the data. We also have data scientists and journalists. Every kind of person that you might imagine might be on a project like this is on the project.

What have you learned about how testing is dispersed across the country?

The rates in testing are determined by a few different factors. Places that had really big outbreaks tend to have large testing operations. States that either already had or rapidly developed a very close relationship with a particular laboratory tended to do quite well in the early testing rounds. It’s hard to say that there’s one overriding factor that leads a state to do a lot of testing per capita. It’s kind of all over the map.

”We have to understand that the data is not a raw good, it has to be manufactured and put together. That takes time and effort, it takes people, it takes money.”

Over the next few months, I think we’re going to see the American innovation machine really go to town on this problem. There are some communities where lots of people are unemployed, lots of people are undocumented, and that’s where we need the most support. Already, black people dying at much higher rates, Latinos are overly represented in the cases. I expect that that is only going to go up. While this virus can affect everybody’s body roughly in the same way, being wealthy is quite protective, because you don’t have to go to a job, you work from a computer, you can leave a city where there are lots of infections. There’s all kinds of reasons for that. But all you have to do is look at a ZIP code map of New York to see what’s going on.

Do you have any plans as of now to expand the project? How will you adapt as circumstances change over time?

We’re interested in county level data particularly for race and ethnicity purposes. And we’re pretty interested in what the possibilities are for ZIP code level data within Metros. We also want to incorporate socioeconomic data into the racial and ethnic data. The core thing is to continue putting out this dataset and continue advocating for the transparency that the country needs, and making sure that people are able to connect the individual pieces of data.

From what you’ve seen, what policy changes need to be made to address the data problems you’re dealing with?

At the end of the day, we need legislation that provides national standards for data reporting. And that system needs to be adequately funded. We need to protect all of our people in this country, because if we don’t, it’s going to be really difficult to control this disease.

People have to know what’s happening so they can make good decisions. And we have to understand that the data is not a raw good, it has to be manufactured and put together. That takes time and effort, it takes people, it takes money. This is crucial to support the state health departments and hospitals that are generating the data. We can’t just assume data will be there when we need it. That’s how we can make a big impact—by putting together the right data infrastructure at the state and national levels. If you want a winning nonpartisan issue around coronavirus, fixing our data collection infrastructure is it.