|
Getting your Trinity Audio player ready...
|
Monthly jobs numbers and the Census Bureau might be the first—and only—things that come to mind for many Americans when they think about federal data. But government data undergirds many of the everyday essentials Americans rely on, like weather forecasts and tornado warnings. Federal data keep track of crime and public safety, provide early warning of epidemics, and help farmers plan their crops.
But all of that is under threat.
To President Donald Trump, data are both a weapon and an enemy.
On the one hand, Trump cites spurious—and often outlandish—numbers to justify his policies. He’s claimed, for instance, that as many as 20 million unauthorized immigrants are living in America—or about double the real number—to rationalize the intensity of his detentions and deportations.
He’s also boasted of securing “over $17 trillion” in new U.S. investments—presumably to validate his tariffs as a strategy for domestic economic growth. (The White House, however, claims $8.8 trillion in new investments, and even that figure is fiction).
At the same time, Trump is suppressing, disappearing and even altering data to fit his agenda or to hide inconvenient truths about the impact of his actions. Earlier this fall, the U.S. Department of Agriculture ended its annual survey of hunger in America, just weeks before the recent government shutdown that paused food stamp benefits for millions of Americans. This summer, Trump fired the commissioner of the Bureau of Labor Statistics after a weak jobs report he didn’t like. And as part of Trump’s campaign against “DEI,” government agencies have quietly altered at least 200 federal datasets to remove references to “gender” in favor of references to “sex,” according to an analysis by the Lancet.
These active assaults on federal data have also been accompanied by neglect. Drastic cuts to the federal workforce, including by Elon Musk’s DOGE, have hollowed out capacity at many agencies to collect and maintain data, including data vital to US industries, agriculture and ordinary citizens.
Denice W. Ross, former Deputy U.S. Chief Technology Officer and U.S. Chief Data Scientist under President Joe Biden, is sounding the alarm on the degradation of America’s federal data infrastructure and the myriad risks that presents. She’s also spearheading an effort, EssentialData.us, to track and preserve disappearing data.
This transcript has been edited for length and clarity. The full interview is available at Spotify, YouTube, and iTunes.
Anne Kim: When most people think about federal data, there are probably only a handful of numbers that come to mind: Jobs and employment numbers from Bureau of Labor Statistics, for instance, the Census Bureau, maybe the weather. But there’s a whole lot more to federal data than that. I came across your post for the Federation of American Scientists, where you said there are more than 300,000 federal data sets, which is just an astonishing number.
Denice Ross: The federal data ecosystem is vast, and it’s so much larger than what we typically think of with jobs and weather data. One of my favorite data sets that’s really surprising is the US Geological Survey’s North American Bat Monitoring database. What’s important about bats and why we need to monitor them is that they provide billions of dollars of free services to America’s farmers every year.
If you want to continue that free service, you need to protect the bats. And if you want to protect the bats, you need to know where they are. So what this geospatial data set does is to identify the location and numbers of bats around America.
That also makes it easier when development is happening. For example, if you’re building a highway bridge, or a mine, or a wind farm, you want to make sure that development is mitigating any harm against these bats. The developers need to know where the bats are. So rather than counting the bats themselves, they can go to this federal data set.
And then lastly, there’s some research that suggests that in places where bats do disappear—rural areas, farming areas—infant mortality goes up because farmers have to use more pesticides. That’s the working theory. So that just really ups the stakes for why these data are so essential.
That’s just one of hundreds of thousands of data sets across the federal ecosystem that at first glance might seem not important, but actually have really substantial consequences for American lives and livelihoods.
Anne Kim: How do these data sets come about? And is it immediately apparent that a data set will benefit industry and ordinary Americans, or is that something that evolves as the data set evolves and people find new applications for it?
Denice Ross: That’s a great question. Sometimes data sets are mandated by Congress. For example, the National Assessment of Educational Progress, known as the NAEP, or the “Nation’s Report Card,” is specified in law. It contains the details about what variables need to be collected, including race, ethnicity, gender, and income levels.
Most of the time, though, it’s just data that are collected because they are necessary to run government. Sometimes it’s surveys that are collected to inform policies and program design, and other times it’s administrative records. For example, people who are seeking disaster relief fill out a form that says where they live and what happened, and then that starts the process of getting benefits from FEMA. That creates administrative data behind the scenes.
Anne Kim: Step back to the big picture and talk about the big goals that all of this data collection does for Americans. With the bat monitoring program, there are clearly implications for industry and public health. But are there large policy pillars that define the function of all of these datasets and make the case to the public about why all this data is so important to collect and to maintain?
Denice Ross: I’m so glad you asked that because data, and especially federal data, are a type of infrastructure that we really take for granted. Data help keep us healthy. They help keep us employed and safe. And data also support innovation and the economy at large.
I can give you a few examples that are policy relevant. At the beginning of November, we were talking a lot about hunger in America, and there were already policy changes to the SNAP food stamps program that were likely to cause millions of Americans to lose their food benefits.
How might we know the impact of that policy change? Well, there’s the food security supplement that is a collaboration between the USDA and the Census Bureau, and that’s been shining a light on hunger in America for the last 30 years. It’s the only data set that gives us a full picture of what’s happening at the state level, and what’s happening with children versus adult hunger. That was recently terminated by the USDA.
There’s also another way of understanding what’s happening with food assistance. States are required submit to USDA their application processing timelines, with the goal being that it should take no longer than 30 days for a state to process a SNAP application, and seven days if it’s an emergency. You can imagine that if you’re a grandparent and you recently took custody of a grandchild, and now you need help putting food on the table, it would be really bad if it takes more than 30 days or even more than a week to get that assistance.
This is a dataset that holds states accountable for processing those applications in a timely manner. Just the mere fact that it’s transparent and available to the public and the media helps increase the states’ ambition to meet those deadlines.
Anne Kim: I want to talk more broadly about the current administration’s approach to data, and data preservation and collection. They’re certainly well aware of the power of data, which is why they are systematically suppressing it. You mentioned the survey of household food insecurity that’s no longer happening, just as there are going to be major cuts to the SNAP program as a result of the “Big Beautiful Bill.” The administration has also challenged the accuracy of data. Most recently, they fired the commissioner of the Bureau of Labor Statistics over numbers they just didn’t like. How would you characterize how the administration handles data, and what do you find most problematic about their approach?
Denice Ross: There are three main buckets of damage that we’re seeing to the federal data ecosystem. The first was very high profile, at the beginning of the administration, when many data sets were taken down at the end of January in order to be scrubbed of elements that did not align with administration priorities. This was gender, DEI, and climate.
For the most part, those data sets went back up. There were a few data elements that did not. For example, the Bureau of Prisons had a data set on inmate statistics, and they removed the transgender category from the gender of inmates. Another example is the National Crime Victimization Survey. There were three questions there that were removed on gender.
Similarly, the Office of Personnel Management, OPM, has a really important data set called FedScope, which gives us a sense of the characteristics of the federal workforce, which are very policy relevant given the major shifts that we’ve seen in the federal workforce over the last few months. They deleted all of the race and ethnicity data going back years from that FedScope data.
And then what we’ve seen, which is more damaging and more like a death by 1,000 cuts, is the diminishment of the capacity of federal agencies to collect, protect, and publish data. That’s due to the cuts in staffing, the cancellations of contracts, and the terminations of the federal advisory committees that are so essential for helping data collections keep up with changes in modern society.
We’re also seeing that it’s just harder to get things done in government. For example, if the Secretary of the Department of Commerce has to personally approve every contract above $100,000, it’s just going to be so much harder to get work done, and there’s fewer people to do the work. That will manifest across so many different types of collections.
For example, take the ground-based radar that protects rural America from tornadoes. Somebody’s got to fix that equipment if it starts malfunctioning, and if the staff or the contractors who do that work are no longer around, those sensors will start to decay over time. So the quality of the tornado forecasting will go down. It’s hard to even fathom the small bits of damage that are happening that will have such a large collective impact on the quality of the information that we need to just run a modern society.
The third trend that we’re seeing, which started this summer in earnest, are the attacks on data that might reveal that administration policies are not working as promised. The first data set that I noticed was the Social Security Administration’s call center wait times data set, which was terminated right around the time when thousands of Social Security Administration employees were let go, field offices were closing, and increased fraud protection measures were going in place. So it would be likely that more Social Security recipients would be needing to call the call center to resolve any issues. But that data set disappeared right around that time.
And then, as you noted already, that when the jobs numbers came out this summer that did not align with the administration’s message, the BLS commissioner was fired. And then more recently, of course, that food security supplement was terminated right as millions of Americans were about to lose their food stamp benefits. So I expect to see more of those types of losses moving forward when the data are just sort of politically inconvenient.
Anne Kim: What about data integrity and accuracy, particularly in light of political pressure? We already know about the political pressure on BLS, but there’s going to be pressure on other agencies as well. I’m sure it’s happening all the time. Are Americans going to be able to trust the data they get from their government for the next few years?
Denice Ross: So far, we have not seen any direct manipulation of numbers. There’s one Lancet article that does an excellent job of cataloging changes to column headers in some datasets—health datasets in particular—where a survey might have collected the gender of a person, and that column header was changed to sex. And then that change was not included in the documentation about that dataset. That’s the closest thing to manipulation that we’ve seen so far.
However, it’s worth thinking about the life cycle of data. You have primary datasets that only the federal government can produce, and then there are derivative works—the way that agencies might be interpreting that primary data that they’ve produced.
With the Department of Energy’s recent report on the impact of greenhouse gas emissions or the new recommendations for children and vaccines, for example, we’ve seen some interpretations of the data that are not as scientific as we would normally expect from the federal government.
Anne Kim: What about academic reliance on data too? There are many professors around the country and research institutes and think tanks that rely on federal data and interpret it. As the quality and quantity of data diminishes, what’s the downstream impact going to be on the quality of scholarship around all this data?
Denice Ross: The quality of scholarship will certainly be challenging.
There are some stopgap measures in place. For example, there are partnerships already between federal agencies and universities to collect data like the Framingham Heart Study out of NIH. NOAA’s also got this fantastic fleet of floating buoys in the ocean that gives us ocean temperature conditions.
These academic partnerships helps keep these datasets a little more secure because they might have multiple stakeholders and they aren’t necessarily in the .gov space. But over time, I think what we’ll start to see is a slow disintegration of the quality of the data coming in and the ability to keep hosting it.
But even more critical are the downstream consequences—the impacts on the American people—when our nation’s research enterprise is so hobbled by these losses to data flows. What I’m telling my colleagues who are users of data right now is that if you are using data to do your work, now is the time to advocate for why those data matter. We, and I include myself in that, have not done a great job of talking about how federal data are absolutely essential for benefiting American lives and livelihoods.
Anne Kim: That brings us to your website, essentialdata.us, which is something that you’ve created to catalog the data sets that are disappearing and to make the case to the public about the importance of data. I was looking at the site recently—this was around Halloween—and I noticed that one of the pages on the site is titled Dearly Departed Datasets.
It’s literally a graveyard for datasets, and you’ve got these little tombstones for the data that’s disappearing. This graveyard also seems to be growing. How are you keeping track, and what else is disappearing?
Denice Ross: The reason we decided to do this campaign to crowdsource the “dearly departed datasets” right before Halloween is that a lot of people, especially journalists, were asking which data sets have actually disappeared.
It turns out that the number that are actually gone right now is relatively small. It probably numbers in the dozens rather than the hundreds or thousands. But that really understates the risk to the entire data enterprise.
In addition to the examples already mentioned, another dataset that’s [gone] is the “drug awareness network”—DAWN. That was a health surveillance network that monitors drug-related visits to emergency rooms, and it serves as an early warning system when new forms of dangerous illicit drugs start to pop up in a community. Where it’s going to show up is in emergency rooms, and it really makes sense for the federal government to consolidate that information.
Another example of a dataset that’s disappeared is EPA’s greenhouse gas reporting program. It was imperfect, certainly, but it was the only way we had to get information on emissions from some of the nation’s largest emitters.
Anne Kim: Is this data gone forever? there anybody that’s been archiving the data? Is that even possible? Is there a plan for restoring the data someday?
Denice Ross: When we say a dataset “disappears,” the historical data so far are mostly still available in the federal web space. But the collection is terminated moving forward. Civil society has been really fantastic, especially with the leadership of groups like the Data Rescue Project, at archiving some of these key data. And I definitely sleep better at night knowing that there are archives, and in some cases, multiple versions of data sets, that preserve the snapshot in time.
But the best outcome is that we keep these data flowing because the value of the data is in its continual update. That’s why it’s so important that people who are depending on the data make the case to federal data stewards and policymakers and elected officials about why these data matter.
Anne Kim: Even if you do have an opportunity to begin collecting the data again in a couple years, though, there’s probably been enough damage done to the infrastructure that it probably is a longer term project to rebuild it, right?
Denice Ross: We’re going to have to retool and figure out what the future of a more resilient national data infrastructure looks like. But for now, we have to protect the core, and the many data sets that only the federal government can produce.
If we lose that continuity, we are going to be flying blind as a nation during a time when we are having so many really dramatic shifts in public policy. We’ve got the steady cadence of climate fuel disasters, we’ve got civil unrest, and we’ve got the transformation happening because of AI and new technologies. If we ever needed to be operating with all of our instrumentation, it’s now.
Anne Kim: Not to mention public health, crime…
Denice Ross: Absolutely. Public health, all the things that might keep one awake at night. Data usually has some foundational role in policies and our ability to prepare as a nation.
Anne Kim: Is there anything that ordinary citizens who are concerned about this problem can do?
Denice Ross: I would encourage you to check out EssentialData.us and talk to your friends and colleagues about how important these data are that we’re taking for granted.
For example, one dataset that we take for granted is the heat index that comes from the National Weather Service. Football coaches use the heat index to know when to move football practice inside so their players don’t die of heat stroke.
When I take my kids camping, that wooden sign that tells me what the fire risk level is and whether or not we can light a fire comes from federal data also. Just be more aware about the role that federal data plays in your everyday life.



