Understanding Metadata: A Key to Data Sharing and Reuse

Data Science Dispatch |

Metadata plays a crucial role in sharing and reusing scientific data. Understanding what metadata is and how it is used can accelerate your research and increase the visibility of your work. It can also help to advance the field of infectious and immune-mediated disease (IID) research.

What is metadata?

Metadata is data about data. It provides additional information to help people understand the data, such as its origin, structure, and context. 

For example, for a genome sequence, the data is the actual sequence of nucleotides. The metadata is the author of the data, the date the data was collected, the measurement techniques used, the health condition at the focus of dataset (like asthma or autoimmune diseases), and more. You can see another example of data versus metadata in the video on the right (data management and sharing webinar from the National Institute of Diabetes and Digestive and Kidney Diseases, 4:22-6:28).

Examples of common metadata elements that describe IID research data are available at the NIAID Data Ecosystem’s list of common fundamental and recommended metadata elements

Why is metadata important? When you share scientific data, metadata provides the context that allows others to understand, trust, reproduce, or reuse data. This is particularly important in studies or secondary analyses where data is integrated from multiple sources; comprehensive metadata enables a scientist to combine data from different sources.

Using metadata effectively can also help your data get discovered, reused, and cited—thereby maximizing the value and impact of your research.

Collecting rich metadata during research

Effective metadata use starts with collecting rich metadata throughout the research process. “Rich” metadata is detailed and structured, making it easier for people to quickly learn about your data. 

Including standardized formats and schemas makes it clear which metadata components are present and where they can be found. Using common terminologies, ontologies, and data formats takes this a step further by defining specific metadata elements for both people and computers. Machine-readable metadata allows users to learn about and use data using code, helping them quickly learn about many data files.

Some common examples of collecting metadata in a structured way include defining standardized date and time formats and using ORCID IDs for authors to ensure precise identification.

Biomedical researchers can follow some basic steps to ensure that they are collecting comprehensive and standardized metadata. 

1. Determine necessary metadata content and formats

Collecting data in the format you intend to share it in is more efficient than reformatting everything at the end. Here are some questions to help you determine data and metadata formats:

  • Who will use these data and how will they use it? What information do they need to understand the data?
  • Many research areas have standardized metadata formats that researchers can follow. What metadata standards or schemas do other researchers in your field use? Would using these standards and schemas help researchers understand and reuse these data?
  • Does the target repository or scientific journal have any specific metadata or formatting requirements? If the repository where you plan to share your data has specific guidance, follow that guidance from the start of your research.

2. Create metadata throughout the data lifecycle

Before data collection, collect protocol documentation and set up systems for data and metadata collection. These systems can collect information using the standards, formats, vocabularies, and ontologies selected, and will save you time when preparing data and metadata for publication.

During the data collection phase, document anything that fits into the target metadata fields. These may include the dates data was collected, variables measured, the units of measurement, the instruments used, and the conditions under which the data was collected.

After data collection, add any remaining metadata elements from your plan. These elements may focus more on describing data processing steps, versioning, authors, or related topics. 

3. Prepare to share data and metadata 

Verify that metadata meets requirements for where you would like to share your data, and add any elements that you may finalize late in the data lifecycle, like associated publications, license for reuse, or a data author list prior to sharing.

Throughout the process, you can seek guidance from your program officer or the repositories where you intend to share data to ensure that metadata is collected and shared effectively.

Sharing data and metadata

The NIH Data Management and Sharing Policy encourages sharing metadata that describes or supports your scientific data. NIH recommends data management and sharing practices consistent with the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles, and it strongly encourages the use of established data repositories for preserving and sharing data

In some instances, the full scientific data cannot be shared easily. This may be due to large file sizes — particularly with imaging-related research — or data privacy regulations. However, even if the actual scientific data cannot be shared, sharing metadata is still valuable. This practice ensures that there is a public record of the data's existence and provides important background information that can be used by other researchers.

Metadata is also a powerful tool for finding scientific data in repositories. Researchers can use metadata to search for data sets that match specific criteria. One tool that can help researchers find relevant data is the NIAID Data Ecosystem Discovery Portal, which uses metadata present in data stored in repositories to search across over 50 different IID repositories and data sources. 

Learn more about developing a data management and sharing plan and compliance with relevant NIH data sharing policies by reviewing the Data Policy and Guidance page

Content last reviewed on