Data Organization and Metadata Capture

The slides from Prof Alan Christoffels talk on Data Organization and Metadata Capture

Example data for the Data Organisation and Metadata Capture discussion

Example data coming soon

The PHA4GE SARS-CoV-2 Contextual Data specification

PHA4GE is a global consortium of genomic epidemiology, bioinformatics and public health practitioners aiming to “Improve Openness And Interoperability In Public Health Bioinformatics”. The PHA4GE Data Science Working Group (DSWG) developed a data specification for SARS-CoV-2 contextual data (also known as metadata) as an effort to harmonize information about how, where and from who samples were collected, how they were processed, what is known about the host and so on. The specification and its supporting documents can be found here. The specification is distributed along with an Excel spreadsheet that can be used as a starting point for producing a data collection template.

Tools for working with metadata

The PHA4GE specification was listed above. In addition, some tools for working with data collection and processing include:

  • The DataHarmonizer spreadsheet editor allows custom templates to be created for data entry and data validation. Find it here.

  • Data-flo is an environment for building data integration and cleaning workflows. A tutorial is available.