Data Publishing/ZFMK Data Center: GFBio publication of type 1 data via BioCASe data pipelines

From GFBio Public Wiki
Jump to: navigation, search

The ZFMK Data Center is one of the 7 GFBio Collection Data Centers that are part and form the backbone of the GFBio Submission, Repository and Archiving Infrastructure. The data archiving and publication at ZFMK includes management processes with several Diversity Workbench modules (DC, DP, DTN, DA), the online platform Morph·D·Base and the webbased collection management system easydb. Management tools and archiving processes as done at the GFBio data center ZFMK are described under Technical Documentations. This includes services for documentation, processing and regular archiving of the incoming original (meta) data sets and multimedia objects (source data; SIP) under involvement of DiversityProjects (DP) functionality for metadata ingest from GFBio submission tool. Data producers are welcome to use xls templates as provided under Templates for data submission. ZFMK uses DWB tools for data and metadata import, metadata enrichment and data quality control (see Furthermore DOI creation is provided.

The workflow with these central components is illustrated in figure 1 and described in the text below.
Figure 1: The ZFMK Workflow, BioCASe data pipelines for GFBio Type 1 Data.
ABCD - Access to Biological Collections Data schema
SIP - Submission Information Package
AIP - Archival Information Package
DIP - Dissemination Information Package
VAT - Visualizing and Analysing Tool

Data pipeline - Provision of (versioned) DIPs

Export of DIPs with ZFMK in-house-management systems used in GFBio (DWB, Morph·D·Base, easydb)

  • Citation
Based on the data provider's input (submission metadata) the citation of the dataset will be prepared by the ZFMK Data curator adjusting the input (submission metadata) to be conform with the GFBio citation pattern. The citation is finalized in close collaboration with the data provider.
Example: ZFMK Ichthyology Working Group (2018). The Ichthyology collection at the Zoological Research Museum Alexander Koenig. [Dataset]. Version: 2.0. Data Publisher: Zoological Research Museum Koenig - Leibniz Institute for Animal Biodiversity.

  • Licenses
The licenses for the data packages are ingested in DP during the submission/ ingestion process.
The licenses for multimedia objects are handled separately and stored in DC together with multimedia URLs. GFBio is promoting CC licenses, ZFMK favorite license is CC BY-SA 4.0.
  • GFBio data and metadata created during submission
The metadata which are generated during GFBio submission are processed via JIRA ticket system and ingested in DiversityProjects. Additional metadata and original research data are imported in DWB RDMS via DWB ImportWizards. Additional parameter assignment is done manually by the ZFMK data curator in close cooperation with data provider.
  • GFBio IDs according to GFBio consensus documents
All GFBio IDs as well as other external IDs as far as available (e.g. DOIs, GenBank accession numbers, BOLD numbers, MycoBank numbers, ORCID IDs, GFBio submission IDs, DSMZ strain numbers etc.) are stored in appropriate tables of the DWB installations. As far as part of GFBio consensus documents they will be published.
  • Occurrence data according to GFBio WP5 consensus document
The occurrence data are stored at two levels and two granularities, (a) at dataset level in DP (setting elements) and (b) at unit level.
  • Other (meta)data
Multimedia and morphological data are stored in Morph·D·Base and linked to corresponding dataset in DC.
Other (meta) data recommended or mandatory for export are either stored in DP, DA or DC.

Transformation of DIPs for ZFMK archiving system

  • Archiving and versioning of DIPs
A snapshot of the dataset will be taken, transferred to a MySQL Database and mapped with the BioCASe Provider Software to ABCD 2.1. An AIP is created and will be stored as a .zip-archive in easydb. Every snapshot of the dataset is recognisable at a date supplement and his version number consisting of two parts: Majorversion.Minorversion (i.e. 2.1). Major changes (i.e. adding further data to the dataset) lead to an increase of the first part. Minor changes (i.e. correction of typing errors) will be visible in a rising of the second part of the version number.

Transformation of DIPs for publication in GFBio data portal

  • Access via BioCASe Local Query Tool, Landingpage
All ZFMK datasources are accessible via ZFMK BioCASe Local Query Tool. A landingpage for each data package will be provided at ZFMK easydb (in Pool GFBio Archiv) or at the project own website.
  • Access via BioCASe Monitor service (BMS)
see General part: GFBio publication of type 1 data via BioCASe data pipelines
  • Citation of published dataset
The proposed citation string is given according to the scheme examplified above, for details see General part: GFBio publication of type 1 data via BioCASe data pipelines
  • DOI assignment
ZFMK is registered at ZB MED and can create unique DOIs for each data package. The DOI is created at DataCite DOI Fabrica, annotated to the corresponding version of the information package and stored in approbiate tables of DP and easydb. It is also part of the citation of the dataset.
  • Indexing/harvesting by central GFBio indexing processes
see General part: GFBio publication of type 1 data via BioCASe data pipelines
  • Access via GFBio Data Portal
see General part: GFBio publication of type 1 data via BioCASe data pipelines