Infectious disease research generates millions of data points. But while one researcher’s data may be useful to many other researchers, that data is rarely reused.
A new project sponsored by the NIAID Office of Data Science and Emerging Technologies (ODSET) is aiming to change that by making NIAID data easier to find and reuse. The project examines NIAID’s data repositories and makes recommendations on how to make the data within more FAIR — Findable, Accessible, Interoperable, and Reusable.
“An overarching goal is to enable scientists and the scientific community to find data more easily,” said Reed Shabman, Ph.D., ODSET Acting Director. “If the community can easily find data, they are able to reuse that data to make new discoveries.”
The NIAID Data Landscaping and FAIRification project focuses on metadata — the information that describes data, such as authorship, date published, disease or condition studied, and other crucial details. The project makes recommendations on how NIAID repositories can implement high-quality metadata that adhere to the FAIR principles. Comprehensive, detailed metadata enables scientists to quickly find existing data and apply it to their research — reducing “time-to-science” and potentially accelerating the pace of biomedical research and discovery.
The project also focuses on other aspects of FAIR implementation, including use of relevant data standards and persistent identifiers (PIDs). Organizing NIAID data using PIDs makes it easier to assess the data’s reuse and impact. It also increases the interoperability of data within and outside of NIAID.
“Discovering data that you didn’t know existed empowers better-informed experiments,” Shabman said. “It can be cost-saving because you may avoid repeating past experiments. You’re going to be able to combine data from multiple places to answer research questions that you can’t answer today. There are many potential applications!”
The project is funded by the Frederick National Laboratory for Cancer Research, operated by Leidos Biomedical Research, Inc., on behalf of NIAID. It is being conducted by the GO FAIR US Team, led by staff of the San Diego Supercomputer Center (SDSC) at UC San Diego, the GO FAIR Foundation, the National Center for Atmospheric Research, and other partners.
Christine Kirkpatrick, M.A.S., SDSC Research Data Services director, leads the NIAID Data Landscaping and FAIRification project. Kirkpatrick’s team is reviewing NIAID’s internal and external data repositories covering the range of infectious, immunologic, and allergic diseases.
“NIAID is awash in rigorous, high-quality data,” Kirkpatrick said. “I can’t think of a better [institute] to be working with than NIAID, because all of NIAID’s mission areas are so immediately impactful to the world.”
Developing data infrastructure, empowering AI, and building a culture of reuse
Pandemic preparedness — being ready to respond to the rapid spread of an infectious disease — is an important imperative for making data easier to find and use. Both Shabman and Kirkpatrick referenced the COVID-19 pandemic to illustrate the importance of accessible data. Data about SARS-CoV-2, the virus that causes COVID-19, was initially fragmented. Ensuring that transmission data and vaccine research data were readily available and accessible proved to be crucial to slowing the pandemic’s spread, developing vaccines, and saving lives.
“Imagine a scenario where, as that data is being generated, it can be consumed in a way that provides us with the information to make quick decisions,” Shabman said. “If that data infrastructure already exists, and Pathogen X emerges tomorrow, we could know important information about Pathogen X and its spread immediately.”
Creating accessible and interoperable metadata can also support the ability of artificial intelligence and machine learning models to comb through datasets more easily, facilitating the use of these developing technologies in biomedical research. An example of this in practice is using metadata to make units of measurement machine-readable, so that datasets that use different units of measurement can be easily compared or combined.
Kirkpatrick said that applying FAIR principles provides greater transparency and can increase trust in science. And Meghan Hartwick, M.Sc., Ph.D., ODSET Program Officer, said that she hopes it will encourage a growing culture of data sharing and reuse.
“I think as the tools emerge and get better at making data discoverable, interoperable, and reusable, the feedback loop of the positive outcomes will encourage people to continue [sharing and reusing data],” Hartwick said. “There's hope that at some point, datasets will be cited and be as important as a journal publication.”
Kirkpatrick said that NIAID's significant data resources can lead to impactful new discoveries developed from existing datasets.
“I really believe that a lot of the world's questions and challenges can be answered in the data we already have,” Kirkpatrick said. “… There are so many things to be learned by making [data] available for questions researchers don't have the answers to yet.”