Major Types of Biological Data

From GFBio Public Wiki
Jump to: navigation, search

Within GFBio we distinguish five major types of biological data. They are used for the "Service Description" of the individual data centers Data Centers as well as in the context of the Technical Documentations of processing tools.


Type 1: Biodiversity and Occurrence data
These are the data from the classical collection and alpha-diversity research domain, i.e. digital objects with taxon name(s), georeferences, e.g. locality, date and often referenced resources as multimedia objects.
We distinguish between
  • Type 1a: Collection Data (with reference to physical object)
  • Type 1b: Observation Data (without reference to physical object)
Used standards
  • ABCD (Access to Biological Collection Data) and extensions
  • DwC (Darwin Core) and extensions
  • DC (Dublin Core) as included in ABCD and DwC for basic bibliographic information
Used identifiers
  • primary identifier: biological (digital) object (digital specimen or observation)
  • main secondary information: geo-information and time, related (multimedia) resources
Example packages
Notes
The time investment for individual scientific data curation to be done by data providers and GFBio data managers before and during data transformation is varying.

Type 2: Taxon Data
These are taxon-related data (e.g. in a catalogue, checklist or so-called red list).
Used standards
  • ABCD (Access to Biological Collection Data) and extensions
  • DwC (Darwin Core) and extensions
  • DC (Dublin Core) as included in ABCD and DwC for basic bibliographic information
Used identifiers
  • primary identifier: class name (taxon), e.g., as defined by the nomenclatural rules of the three International Codes of Biological Nomenclature
  • main secondary information: taxonomic classifications and concepts, synonymy, vernacular names, geo- and conservation status information etc.
Example packages
Notes
The time investment for individual scientific data curation to be done by data providers and GFBio data managers before and during data transformation is varying.

Type 3: Environmental Biological and Ecological Data
These are environmental biological and ecological study data including functional and phylogenetic trait data and other kind of analysis data.
Used standards
  • EML (Ecological Metadata Language)
  • DELTA (Description Language for Taxonomy, for trait data)
  • SDD (Structured Descriptive Data, for trait data)
  • GML (Geography Markup Language) and ISO 19139 metadata
Used identifiers
a)
  • primary identifier: biological class concept (e.g., OTU or OFU)
  • main secondary information: trait and environmental (analysis, measurement, transformation, translocation) information
b)
  • primary identifier: environmental and ecological study item and event
  • main secondary information: biological and ecological information, measurements and description of the environment
Example packages
Notes
The time investment for individual scientific data curation before and during data transformation of (matrix) data into a highly structured and standard schema-compliant format at data item level might be high. Thus, the data management process has to be agreed between data provider and GFBio data curator before starting (see DMPs).

Type 4: Non-Molecular Analysis Data
These are non-molecular analysis data (data sets and/or data packages) in its original data file format (often RAW format).
Used standards
  • EML (Ecological Metadata Language) for basic bibliographic information
  • DC (with Pansimple XSD) for basic bibliographic information
Used identifiers
  • primary identifier: as provided by data producer
  • main secondary information: as provided by data producer
Example packages
  • coming soon
Notes
This type of data is accepted, as far as well documented and with a core set of standard-compliant metadata and appropriate for long-term archiving.
The time investment for individual scientific data curation to be done by data providers and GFBio data managers before and during data transformation might be limited.

Type 5: Molecular Sequence Data
These are molecular sequence data including MIxS-compliant metadata.
Used standards
Used identifiers
  • primary identifier: molecular sample accession
  • main secondary information: geo-information and time
Example package
Notes
The time investment for individual scientific data curation to be done by data providers and GFBio data managers before and during data transformation might be limited.


For more details see also