Data exchange standards, protocols and formats relevant for the collection data domain within the GFBio network

From GFBio Public Wiki
Jump to: navigation, search

The Natural History Collections and Culture Collection BGBM, DSMZ, MfN, SGN, SMNS, SNSB, ZFMK with their evolving GFBio Collection Data Centers/ Data Archives are partners of several national and international initiatives and projects developing and using data exchange protocols and standards. During the first months of the GFBio project the partners collected and evaluated relevant technical documentations of existing domain-specific data exchange formats, interfaces and protocols with relevance for the interaction and harmonization between collection management systems and archive infrastructure as a whole (see table below). This table deals with collection standards, but it was decided to include SDD, EML and GML as well, as they are used and closely connected to the work within GFBio Collection Data Centers/ Data Archives.

This documentation is part of the process to identify existing and to install and integrate new data exchange mechanisms and protocols appropriate for the GFBio network. It provides useful information for software developers and data scientists to set up and run GFBio agreed standard exchange software solutions at the GFBio Collection Data Centers.


Table: Data exchange standards, protocols and formats relevant for the collection data domain within the GFBio network

Note: Cells highlighted in grey indicate standards and protocols for which the GFBio Collection Data Centers/ Archives have expertise or which they use directly or indirectly.


Standard/ Protocol - Acronym Full name/ Version Short description Documentation/ Schema (URL) Category Data domain Status GFBio Collection Data Centers – Expertise Notes/ References
ABCD 2.06 Access to Biological Collections Databases v2.06 (2007-06-13) Standard for the access to and exchange of data about specimens and observations XSD File; Schema Data Exchange Standard Collection Archives accepted (TDWG) BGBM, DSMZ, MfN, SGN, SMNS, SNSB, ZFMK [1] [2] [3]
ABCD 2.1 Access to Biological Collections Databases v2.1 (2014-05) Enhanced version developed for GGBN and BiNHum, but it will not be used by GBIF XSD File; Schema Data Exchange Standard Collection Archives published BGBM, MfN, SMNS, SNSB, ZFMK
ABCDDNA DNA Extension for ABCD v2.06 (2009-05-27) Standardised XML Schema extension for ABCD to facilitate storage and exchange of data related to DNA collection units. It offers a rudimentary set of DNA-specific data (Sequences). XSD File; Schema Data Exchange Standard Collection Archives draft (TDWG) BGBM, DSMZ, SGN, SNSB (ZSM) [4]
ABCDEFG Access to Biological Collection Databases Extended for Geosciences Standard developed for use with palaeontological, mineralogical and geological digitalized collection data XSD File; Schema Data Exchange Standard Collection Archives proposed (TDWG) BGBM, DSMZ, MfN, SGN, SMNS, SNSB [5] [6]
ABCDHISPID Herbarium Information Standards and Protocols for Interchange of Data (HISPID). HISPID5 was presented during TDWG 2007 HISPID5 is a file format serving as extension for ABCD v2.06. It was developed by Australian herbaria to enable the interchange of plant specimen data. Documentation ; google code ZIP Data Exchange Standard Collection Archives published (BGBM expertise; HISPID is used in Australia) [7]
AC Audubon Core Multimedia Resources Metadata Standard v1.0 A set of vocabularies designed to represent metadata for biodiversity multimedia resources and collections. Documentation; AC on github Metadata Standard Collection Archives accepted (TDWG) BGBM, SMNS, SNSB, ZFMK (all BiNHum-Partners) [8]
BioCASE UAP[9] Unitlevel Access Protocol v1.31 (2012-11-14) The protocol used in the BioCASE unit-level network for communication between the central software and the wrapper software sitting on top of the providers databases. Documentation XSD Data Exchange Protocol Collection Archives published BGBM, DSMZ, MfN, SGN, SMNS, SNSB, ZFMK [10]
DC Dublin Core: Metadata Element Set[11] v1.1 (2013-06-13) Dublin Core is a Metadata Standard that was originally developed for libraries but its elements have been reused in many other formats as well, e.g. DWC. Dublin Core Metadata Element Set contains 15 elements. Documentation XSD Metadata Standard Literature; General ISO Standard (BGBM, DSMZ, MfN, SGN, SMNS, SNSB, ZFMK) [12]
DC Dublin Core: DCMI Metadata Terms[11] v1.0 (2012-06-14) DCMI Metadata Terms contains 55 terms including the 15 terms from the Metadata Element Set ("Simple Dublin Core") Documentation RDF Metadata Standard Literature; General DCMI Recommendation - [12]
DELTA format DEscription Language for TAxonomy The DELTA format (DEscription Language for TAxonomy) encodes taxonomic descriptions for computer processing and can be used to produce natural-language descriptions, conventional or interactive keys, cladistic or phenetic classifications, and information-retrieval systems. DELTA Format Overview Data Exchange Standard Collection Archives accepted (TDWG) SNSB [13]
DiGIR Distributed Generic Information Retrieval (DiGIR) (v1.5,2003) DiGIR is a protocol for single point access to distributed data sources. Based on HTTP, XML, and UDDI. PLEASE NOTE: DiGIR is out-dated and was replaced by TAPIR and BioCASE UAP! XML-Schema Documents Data Exchange Protocol Collection Archives out-dated out-dated [14]
DwC Darwin Core (2013-10-25) The Darwin Core is body of standards. It includes a glossary of terms intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries. The Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples, and related information. All References/ Schemas available; Darwin Core Resources Data Exchange Standard Collection Archives; General accepted (TDWG) BGBM, DSMZ, MfN, SGN, SMNS, SNSB, ZFMK [15] [2]
EAC-CPF Encoded Archival Context – Corporate bodies, Persons, and Families v2-2003; EAC, Encoded Archival Context (short); EAC-CPF - is an XML-Schema. It provides a grammar for encoding names of creators of archival materials and related information. EAC-CPF is an addition to EAD Schemas Overview; XSD File Data Exchange Standard Biodiversity Literature; Collection Archives adopted standard of SAA (Society of American Archivists); full review of standard is planned for 2015 - [16]
EAD Encoded Archival Description v2002 EAD is a non-proprietary de facto standard for the encoding of finding aids for use in a networked (online) environment published by the Library of Congress. Documentation; XSD File; Schema Data Exchange Standard Biodiversity Literature; Collection Archives accepted Library of Congress - [17]
EDM Europeana Data Model v5.2.3 Model aiming at being an integration medium for collecting, connecting and enriching the descriptions provided by Europeana content providers XSD File; Documentation/ Definition; [EDM Schema Files GitHub] Metadata Standard Collection Archives none (MfN, ZFMK, BGBM through OpenUp!) [18]
EML Ecological Metadata Language (v2.1.1, 2012-06-19) EML is as a set of XML Schema documents that allow for the structural expression of metadata necessary to document a typical data set in ecological sciences. EML Specification Metadata Standard Collection Archives (Ecological Community)  ? SGN [19]
ESE Europeana Semantic Elements v3.4.1 (2013-07-14) A format, which provides a basic set of elements for describing objects in the cultural heritage domain in a way that is usable for Europeana PLEASE NOTE: ESE is out-dated and was replaced by EDM ESE Documentation Metadata Standard Collection Archives none (MfN, ZFMK, BGBM previously through OpenUp!) [20]
GGBN Data Standard Global Genome Biodiversity Network GGBN v1 (2014-06-15) The GGBN Data Standard is a set of vocabularies designed to represent tissue, DNA or RNA samples associated to voucher specimens, tissue samples and collections. It allows combination with DwC, ABCD 2.1, or MIxS. GGBN Data Standard Wikipage XSD Schema Data Exchange Standard Collection Archives (Genomic Data) published BGBM, ZFMK (MfN and SNSB intend to use it soon) [21]
GML Geography Markup Language (v3.3.0, 2012–02-07) "[GML] is an XML grammar for expressing geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographic transactions on the Internet."[22] It is developed and maintained by the Open Geospatial Consortium (OGC) Documentation Schema Data Exchange Standard Geography ISO Standard (ISO 19136:2007) (v3.2.1) - [23]
LIDO Lightweight Information Describing Objects v1.0 (2010-11-08) XML Schema for Contributing Content to Cultural Heritage Repositories XSD File Documentation; Schema Data Exchange Standard Collection Archives ? BGBM, DSMZ, MfN, SGN, SMNS, SNSB, ZFMK through BioCASe (MfN, BGBM, ZFMK through DDB) [24]
MADS Metadata Authority Description Schema v2.1 (2017-04-13) XML schema for an authority element set that may be used to provide metadata about authorized forms of agents (e.g., people, organizations), events (e.g., conferences, meetings), and terms (e.g., topics, geographics, genres). XSD File Schema Data Exchange Standard General accepted (Library of Congress) - -
MARCXML MARC(Machine-Readable Cataloging)XML Framework for working with MARC data in a XML environment. XSD File Schema Data Exchange Standard Literature accepted (Library of Congress) - [25]
METS Metadata Encoding & Transcription Standard v1.10 (2013-07-08) A standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library. XSD File Schema for v1.9 Data Exchange Standard Literature accepted (Library of Congress) - [26]
MIxS Minimal Information About Any(x)Sequence v4.0 (2015-07) A family of minimum information standards (checklists) for standardization of sequence-related metadata, developed by the Genomic Standards Consortium Documentation Data Exchange Standard Collection Archives (Genomic Data) published DSMZ, ENA (through Jacobs Uni), BGBM [27] [28]
MOD-CO MOD-CO Schema representations v1.0 (2018-03-23) MOD-CO schema – a conceptual schema for processing sample data in meta’omics research Schema as SSD structured XML file, Schema, Schema as SMW representation, Project description; further documentation in print Data Processing and Data Exchange Schema Meta-omics data and collection data published SNSB [29]
MODS Metadata Objects Description Data v3.7 (2018-01-04) Schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. XSD File Schema Data Exchange Standard General accepted (Library of Congress) - [30]
NCD Natural Collections Descriptions (2013-10-22) NCD is a proposed data standard for describing collections of natural history materials at the collection level; one NCD record describes one entire collection Documentation Data Exchange Standard Collection Archives draft (TDWG) ? [31]
OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting v2.0 (2002-06-14) OAI-PMH is an XML based REST protocol for Metadata harvesting in a distributed network. It uses Dublin Core for a required minimal set of metadata but can be extended to be used with other XML based formates as well Specification XSD for responses Implementation Guidelines Data Exchange Protocol (Harvesting) General published BGBM, MfN, ZFMK through OpenUp! (Natural History Aggregator), PANGAEA, and BGBM, MfN through BHL-Europe [32]
OLEF Open Literature Exchange Format v1.0 (2012-06-12) Biodiversity literature exchange format developed/ in use by BHL-Europe XSD File; Schema Data Exchange Standard Biodiversity Literature none (MfN through BHLE) [33]
pansimple DC PANGAEA Simple Indexing Format Pansimple is an extended Dublin Core profile XSD File Metadata Standard Indexing none PANGAEA
PREMIS Preservation Metadata Maintenance Activity v2.2 (July 2012) The PREMIS Data Dictionary for Preservation Metadata is the international standard for metadata to support the preservation of digital objects and ensure their long-term usability. XSD File; Schema; PDF Documentation Data Exchange Standard - accepted Library of Congress - [34]
Rosetta Rosetta (ExLibris) (2009-11-04) Rosetta is a commercial product developed by ExLibris for the management of digital assets in libraries and academic environments, enabling institutions to create, manage, preserve, and share locally administered digital collections. Overview; Documentation  ? Biodiversity Literature ? - [35]
SDD Structured Descriptive Data (v1.1, March 2007; latest revision in 2010) Goal of the SDD standard is to allow capture, transport, caching and archiving of descriptive data in relevant forms (e.g. natural language descriptions, dichotomous keys, raw data descriptions), using a platform- and application-independent, international standard. Standard Description Download and SDD Schema files SDD1.1rev5 (.zip) Data Exchange Standard Collection Archives; General accepted (TDWG) SNSB, (BGBM in EDIT-Project) [36]
Simple DwC Simple Darwin Core (2013-10-22) The Simple Darwin Core is a predefined subset of the terms that have common use across a wide variety of biodiversity applications. XSD File Documentation (TDWG) Data Exchange Standard Collection Archives, General accepted (TDWG) ? [37] [2]
TAPIR TDWG Access Protocol for Information Retrieval v1.0 (2010-05-05) TAPIR is a Web Service protocol to perform queries across distributed databases of varied physical and logical structure PLEASE NOTE: TAPIR is out-dated and was replaced by DwC-Archives and BioCASe UAP Documentation XSD Data Exchange Protocol Collection Archives accepted (TDWG) - [10] [38]


Explanations:

Data Domain (values): Collection Archives; Biodiversity Literature; Literature; Geography; General

Standard/ Protocol Category (values): Data Exchange Standard (Def.: standard schema describing semantic of data elements and in some cases also the syntax, i. e. xml, RDF, CSV); Data Exchange Protocol (Def.: rule of communication enabling data exchange. It uses existing standard schemas); Metadata Standard (Def.: standard schema describing the overall content of a whole data set)

Status: status in standardization, e.g. accepted, under review, proposed, none, draft, published, out-dated; standardization body

GFBio Collection Data Centers/ Data Archives – Expertise: standards and protocols in use by GFBio Collection Data Centers; directly or (indirectly)

References

  1. ABCD 2.06 TDWG Standard
  2. 2.0 2.1 2.2 DwC-ABCD Mapping
  3. ABCD 2 Terms on TDWG Terms Wiki (individual concepts are also addressable via LOD compatible short names)
  4. ABCDDNA TDWG Draft Standard;wiki-page documenting ABCDDNA
  5. Petersen et al. 2018
  6. GeoCASe Website - EFG; Kiessling et al., 2006 Proceedings of TDWG
  7. [Conn, R.J. (ed.) (1996). HISPID3. Herbarium Information Standards and Protocols for Interchange of Data. (Council of Heads of Australian Herbaria at Royal Botanic Gardens: Sydney). Viewed at http://plantnet.rbgsyd.nsw.gov.au/HISCOM/HISPID/HISPID3/H3.html on 18 June 2007]
  8. http://species-id.net/o/media/3/38/03_AudubonCore1.0NonNormative_docV1.93.pdf
  9. usually only referred to as "BioCASE Protocol"
  10. 10.0 10.1 BioCASE Protocol and Tapir are very similar
  11. 11.0 11.1 Dublin Core used to consist of 2 different sets of terms: Simple Dublin Core and Qualified Dublin Core. The Simple Dublin Core consists of 15 terms and is also called Dublin Core Metadata Element Set. In 2012 those two sets have been combined and simplified as DCMI Metadata Terms. The Simple Dublin Core set is still valid while the Qualified Dublin Core set has been deprecated.
  12. 12.0 12.1 little known fact: Dublin Core is named after Dublin, Ohio, USA and not after Dublin, Ireland
  13. DELTA System Overview
  14. More information about DiGIR, last modification in 2005;Also see the DiGIR collaboration site, for downloadable files, source code (CVS), the developers' email list, etc.
  15. DWC TDWG Standard; Wikipedia Darwin Core; Wieczorek et al. 2012, PLOS; Baker et al. 2014, Biodiversity Data Journal
  16. General information about EAC-CPF
  17. Beschreibung; Documentation
  18. EDM Documentation
  19. EML FAQ
  20. ESE Documentation incl. xsd File, ESE will still be accepted as a metadata format for Europeana. It will be manually converted to EDM for use in the portal.
  21. Website of Global Genome Biodiversity Network (GGBN)
  22. http://www.opengeospatial.org/standards/gml
  23. GML Official Website, Wikipedia Article on GML, Behr, Franz-Josef: GML-basierte Kodierung von Geodaten (German)
  24. ICOM International Council of Museums - LIDO, LIDO (DDB): Lightweight Information Describing Objects (profile for the Deutsche Digitale Bibliothek)
  25. MARCXML Official Website, incorporated into many other standards, e.g. OLEF
  26. METS Official Website, incorporated into many other standards, e.g. OLEF
  27. MIxS,Yilmaz et al., 2011
  28. MIGS, Yilmaz et al.,2008
  29. MOD-CO, Rambold et al., 2018
  30. MODS Official Website, incorporated into many other standards, e.g. OLEF
  31. NCD TDWG Interest Group Page
  32. Some other complementary XML formats are also provided by the OAI to enhance PHM, like Provenance (XSD), OAI Identifier, Friends, e-prints.
  33. OLEF presentation (Slideshare)
  34. PREMIS Documentation; List of other tools for PREMIS
  35. Overview Rosetta
  36. SDD Wiki Archive
  37. Simple DWC, part of TDWG standard
  38. TAPIR TDWG Standard, Tapir Task Group


Status: July 2018


Back to Concepts and Standards