Because good research needs good data

Biology

Metadata standards

ABCD - Access to Biological Collection Data

A standard for the access to and exchange of primary biodiversity data, including specimens and observations.

Darwin Core

A body of standards, including a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries.

EML - Ecological Metadata Language

Ecological Metadata Language (EML) is a metadata specification particularly developed for the ecology discipline.

Genome Metadata

Descriptive data about single genomes within the Pathosystems Resource Integration Center.

ISA-Tab

A general purpose framework with which to capture and communicate metadata for data files from 'omics-based' experiments employing combinations of technologies.

MIBBI - Minimum Information for Biological and Biomedical Investigations

A common portal to a group of checklists of Minimum Information in nearly 40 biological disciplines.

Observ-OM

Used to integrate and compare observation data across experimental projects, disease databases, and clinical biobanks.

OME-XML - Open Microscopy Environment XML

A metadata standard and data file format for biological light microscopy data.

PDBx/mmCIF – Protein Data Bank Exchange Dictionary and the Macromolecular Crystallographic Information Framework

PDBx/mmCIF is the standard archive format used by the Protein Data Bank (PDB). It provides both metadata and data according to properties defined in the PDB Exchange Dictionary and the Macromolecular Crystallographic Information Framework (mmCIF).

Protocol Data Element Definitions

Used by the the National Institutes of Health (U.S.) for the the ClinicalTrials.gov website. The Protocol Registration System (PRS) is used to register clinical studies with human subjects.

Repository-Developed Metadata Schemas

Some repositories have decided that current standards do not fit their metadata needs, and so have created their own requirements.

Extensions

ABCDDNA

An extension of the ABCD standard for DNA data.

Apple Core

Darwin Core documentation and recommendations for herbaria.

Darwin Core Geospatial Extension

A protocol-independent XML schema for a geospatial extension to the Darwin Core.

DwC Germplasm

An extension to the Darwin Core standard, it includes additional terms required to describe plant genetic resources and in particular germplasm seed samples.

EDMED Metadata Profile

The European Directory of Marine Environmental Datasets metadata scheme, which is a profile of ISO 19115.

FGDC/CSDGM Biological Data Profile

A profile of the FGDC/CSDGM metadata standard, intended to support the collection and processing of biological data.

GBIF Metadata Profile

Established by a global network of countries and organizations, GBIF is a web portal promoting and facilitating the mobilization, access, discovery and use of biodiversity data. The portal uses a profile of EML; a How-to Guide and Reference Guide for using the profile are available.

HISPID - Herbarium Information Standards and Protocols for Interchange of Data

An extension to ABCD 2.06, it is designed to allow the storage and transmission of herbarium plant specimen data.

ISA-TAB Nano

An extension of ISA-TAB specifying the format for representing and sharing information about nanomaterials, small molecules and biological specimens along with their assay characterization data.

isaconfig-diXa
An extension of ISA-TAB for representing and sharing metadata about toxicogenomics experiments.
MIBBI Portal

A list of nearly 40 Minimum Information standards projects registered with the MIBBI initiative.

OME-TIFF - Open Microscopy Environment TIFF

A specification of how to embed OME-XML metadata within a TIFF or BigTIFF image file.

SNRNASM ISA-Tab

An ISA-Tab-based standard for reporting the results of single nucleotide resolution nucleic acid structure mapping experiments.

VarioML
A set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.

Tools

Bio-Formats

Bio-Formats reads proprietary microscopy image data and metadata, and converts them to OME-TIFF, a combination of TIFF and OME-XML.

Darwin Core Archive Assistant

A web application that offers data publishers wishing to serve to the GBIF network an easy interface for describing data elements as basic text files, composing an appropriate XML Darwin Core descriptor file to accompany them.

Darwin Core Archive Validator

A tool to validate XML metadata against the Darwin Core Text Guidelines.

Fiji

Fiji is an image processing package that supports the OME data model for images

Integrated Publishing Toolkit

A software platform using Darwin Core and EML to facilitate the efficient publishing of biodiversity data on the Internet, using the GBIF network.

ISA Software Suite

The open source ISA metadata tracking tools facilitate ISA-TAB-compliant collection, curation, local management and reuse of datasets in an increasingly diverse set of life science domains.

Metacat

Metacat is a repository for data and metadata that helps scientists find, understand, and effectively use the data sets they manage or that have been created by others.

MOLGENIS

A software generator to rapidly build web databases and a suite of web databases for genotype, phenotype, QTL and analysis pipelines.

Morpho

An application for accessing and manipulating metadata and data (both locally and on the network), with wizards creating metadata files using a subset of Ecological Metadata Language (EML).

OMERO

Repository software for organising, viewing, analysing and sharing biological microscopy images. It supports proprietary file formats but normalises to OME-TIFF/OME-XML.

PATRIC Download Tool

Tool for downloading data from PATRIC.

PDBx/mmCIF Software Resources

Parsing, validation, and visualization tools and libraries supporting PDBx/mmCIF, the data standard used by the Worldwide Protein Data Bank.

ProteoRed Tools

Bioinformatics tools to create and extract metadata compliant with the MIBBI-registered MIAPE minimum requirements.

Use Cases

Atlas of Living Australia

An aggregation of information on all the known species in Australia, collected from museums, herbaria, community groups, government departments, individuals and universities. All data is converted to Darwin Core.

BioCASE - Biological Collection Access Service for Europe

The BioCASE Biological Unit Network provides access to a transnational network of biological collections; its protocol requires providers to use the ABCD schema in their configuration files.

BioModels Database

A repository hosting computational models of biological systems, using the MIBBI-registered MIRIAM and MIASE minimal metadata requirements.

BODC - British Oceanographic Data Centre Published Data Library

This national facility for looking after and distributing data concerning the marine environment requires that data sets use a well-documented format such as CF-compliant NetCDF and be accompanied by a Dublin Core record as well as discovery metadata in a recognised standard such as DIF or FGDC/CDGM.

CARMEN

A a virtual laboratory for neurophysiology, enabling sharing and collaborative exploitation of data, analysis, code and expertise. Metadata must include the MIBBI-registered MINI recommendations.

Note: A website for this resource for this resource is no longer available.

CHD7 Database
An open access database that contains anonymised data on both published and unpublished CHD7 variations and phenotype.
Chem-BLAST

A Web-based service for searching for and visualizing chemical structures. It uses data from the Protein Data Bank that has been transformed to RDF.

dbEST - Expressed Sequence Tag Database

A repository-developed metadata schema for EST data in Genbank.

FlowRepository

A database of flow cytometry experiments where you can query and download data collected and annotated according to the MIBBI-registered MIFlowCyt standard.

GBIF - Global Biodiversity Information Facility

Established by a global network of countries and organizations, GBIF is a web portal promoting and facilitating the mobilization, access, discovery and use of biodiversity data. The preferred format for publishing data to the GBIF network is the Darwin Core Archive, and its Integrated Publishing Toolkit uses EML as its data standard.

Harvard Medical School LINCS Database

One of two research centers in the US creating libraries of signatures that describe how cells respond to perturbation, it uses the ISA-TAB standard to describe its data.

International dystrophic eb Patient Registry
The international registry of dystrophic epidermolysis bullosa (DEB) patients and associated COL7A1 mutations.
International Molecular Exchange Consortium

An international collaboration to provide access to a non-redundant set of protein-protein interaction data from a broad taxonomic range of organisms. IMEx partner databases require data to be MIMIx (a MIBBI-registered standard) compatible.

ISA Commons

A network of systems and projects that use the ISA-Tab file format, and/or are powered by components of the ISA software suite.

ISA Commons

A network of systems and projects that use the ISA-Tab file format, and/or are powered by components of the ISA software suite.

JCB Data Viewer

A repository for viewing and analysing multi-dimensional image data associated with articles published in The Journal of Cell Biology. Its native metadata format is OME-XML.

KNB - The Knowledge Network for Biocomplexity

A network of federated institutions that have agreed to share data and metadata using a common framework, principally revolving around the use of the Ecological Metadata Language as a common language for describing ecological data.

Long Term Ecological Research Network

A network providing the scientific expertise, research platforms, and long-term datasets necessary to document and analyze environmental change, it uses the Ecological Metadata Language in describing its data.

MetaboLights

A database for metabolomics experiments and derived information in ISA-Tab format.

MVID Patient Registry
The international registry of Microvillus Inclusion Disease (MVID) patients and associated MYO5B mutations.
National Center for Ecolocial Analysis and Synthesis

An EML developer, this US-based centre of cross-disciplinary research uses existing data to address major fundamental issues in ecology and allied fields.

National Science Digital Library Data Repository

An online portal for education and research on learning in Science, Technology, Engineering, and Mathematics, using a profile of the Dublin Core Metadata Elements for resource and collections metadata.

NEBC ISA Network BioInvestigationIndex

The NERC Environmental Bioinformatics Centre ISA Network's index of ISA-Tab and MIBBI-compliant environmental 'omics data.

OBIS - Ocean Biogeographic Information System

A data repository for marine species datasets from all of the world's oceans; it uses an extension of Darwin Core 2 as its data standard.

PRIDE - PRoteomics IDEntifications database

A centralized, MIBBI standards compliant, public data repository for proteomics data, post-translational modifications and supporting spectral evidence.

Rebioma

A web portal using Darwin Core to describe biodiversity data collected in Madagascar.

The Cell: An Image Library

A resource database of images, videos, and animations of cells, capturing a wide diversity of organisms, cell types, and cellular processes. Its native metadata format for images is OME-XML.

UK Polar Data Centre

An organisation coordinating the management of data collected by UK-funded scientists in the polar regions, using an application profile that is harmonious with both ISO 19115 and DIF.

VertNet

Four distributed database networks (MaNIS, HerpNET, ORNIS and FishNet) using a Darwin Core engine to make bioinformatics specimen data interoperable, mappable and publicly available.

Worldwide Protein Data Bank

The Protein Data Bank archive (PDB) is a worldwide archival repository of information about the 3D structures of proteins, nucleic acids, and complex assemblies. The Worldwide PDB (wwPDB) organization manages the PDB archive and ensures that the PDB is freely and publicly available to the global community.

WormQTL
Public archive and analysis web portal for natural variation data in Caenorhabditis species.
WormQTL-HD
A comprehensive web database for linking human disease to natural variation data in Caenorhabditis elegans