Publication of Type 1 Data via BioCASe Data Pipelines at ZFMK Data Center
The ZFMK Data Center is one of the seven GFBio Collection Data Centers that are part and form the backbone of the GFBio Submission, Repository and Archiving Infrastructure. The data archiving and publication at ZFMK includes management processes with several Diversity Workbench modules (DC, DP, DTN, DA), the online platform Morph·D·Base and the webbased collection management system easydb. Management tools and archiving processes as done at the GFBio data center ZFMK are described under Technical Documentations. This includes services for documentation, processing and regular archiving of the incoming original (meta) data sets and multimedia objects (source data; SIP) under involvement of DiversityProjects (DP) functionality for metadata ingest from GFBio submission tool. Data producers are welcome to use xls templates as provided under Templates for data submission. ZFMK uses DWB tools for data and metadata import, metadata enrichment and data quality control (see https://www.gfbio.org/data/tools). Furthermore DOI creation is provided.
- The workflow with these central components is illustrated in figure 1 and described in the text below.
Data pipeline - Provision of (versioned) DIPs
Export of DIPs with ZFMK in-house-management systems used in GFBio (DWB, Morph·D·Base, easydb)
- Based on the data provider's input (submission metadata) the citation of the dataset will be prepared by the ZFMK Data curator adjusting the input (submission metadata) to be conform with the GFBio citation pattern. The citation is finalized in close collaboration with the data provider.
- Example: ZFMK Ichthyology Working Group (2018). The Ichthyology collection at the Zoological Research Museum Alexander Koenig. [Dataset]. Version: 2.0. Data Publisher: Zoological Research Museum Koenig - Leibniz Institute for Animal Biodiversity. https://doi.org/10.20363/ZFMK-Coll.Ichthyology-2018-03.
- The licenses for the data packages are ingested in DP during the submission/ ingestion process.
- The licenses for multimedia objects are handled separately and stored in DC together with multimedia URLs. GFBio is promoting CC licenses, ZFMK favorite license is CC BY-SA 4.0.
- GFBio data and metadata created during submission
- The metadata which are generated during GFBio submission are processed via JIRA ticket system and ingested in DiversityProjects. Additional metadata and original research data are imported in DWB RDMS via DWB ImportWizards. Additional parameter assignment is done manually by the ZFMK data curator in close cooperation with data provider.
- GFBio IDs according to GFBio consensus documents
- All GFBio IDs as well as other external IDs as far as available (e.g. DOIs, GenBank accession numbers, BOLD numbers, MycoBank numbers, ORCID IDs, GFBio submission IDs, DSMZ strain numbers etc.) are stored in appropriate tables of the DWB installations. As far as part of GFBio consensus documents they will be published.
- Occurrence data according to GFBio consensus documents
- The occurrence data are stored at two levels and two granularities, (a) at dataset level in DP (setting elements) and (b) at unit level.
- Other (meta)data
- Multimedia and morphological data are stored in Morph·D·Base and linked to corresponding dataset in DC.
- Other (meta) data recommended or mandatory for export are either stored in DP, DA or DC.
Transformation of DIPs for ZFMK archiving system
- Archiving and versioning of DIPs
- A snapshot of the dataset will be taken, transferred to a MySQL Database and mapped with the BioCASe Provider Software to ABCD 2.1. An AIP is created and will be stored as a .zip-archive in easydb. Every snapshot of the dataset is recognisable at a date supplement and his version number consisting of two parts: Majorversion.Minorversion (i.e. 2.1). Major changes (i.e. adding further data to the dataset) lead to an increase of the first part. Minor changes (i.e. correction of typing errors) will be visible in a rising of the second part of the version number.
Transformation of DIPs for publication in GFBio data portal
- Access via BioCASe Local Query Tool, Landingpage
- All ZFMK datasources are accessible via ZFMK BioCASe Local Query Tool. A landingpage for each data package will be provided at ZFMK easydb (in Pool GFBio Archiv) or at the project own website.
- Access via BioCASe Monitor service (BMS)
- Citation of published dataset
- The proposed citation string is given according to the scheme examplified above, for details see General part: GFBio publication of type 1 data via BioCASe data pipelines
- DOI assignment
- ZFMK is registered at ZB MED and can create unique DOIs for each data package. The DOI is created at DataCite DOI Fabrica, annotated to the corresponding version of the information package and stored in appropriate tables of DiversityProjects and easydb. It is also part of the citation of the dataset.
- Indexing/harvesting by central GFBio indexing processes
- Access via GFBio Data Portal