Publication of Type 1 Data via BioCASe Data Pipelines at DSMZ Data Center

The DSMZ Data Center is one of the seven GFBio Collection Data Centers which are components of the GFBio Submission, Repository and Archiving Infrastructure. The data archiving and publication is based on an in-house MySQL server system and MS Access frontend. For a more detailed description see public:Technical_Documentations. The data are structured according the ABCD conceptual schema.

The workflow is illustrated in figure 1 and desribed in the text below.



Export of GFBio DIP from DSMZ in-house-management system

 * Citation
 * The citation is based on the data provider's input and according to the GFBio citation pattern. And it is finalised in collaboration with the data producer/customer.
 * Example: BacDive in 2019: bacterial phenotypic data for High-throughput biodiversity analysis Reimer, L. C., Vetcininova, A., Sardà Carbasse, J., Söhngen, C., Gleim, D., Ebeling, C., Overmann, J. Nucleic Acids Research; database issue 2019.


 * Licenses
 * The licenses for the data packages are ingested during the submission/ ingestion process. Data provider or curators may define own rules to use their data. But in general the DSMZ favorites the https://creativecommons.org/licenses/by-sa/4.0/ license for multimedia URLs.


 * GFBio data and metadata created during submission
 * The metadata which are generated through the GFBio submission are stored in the GFBio JIRA ticket system. Additional parameter assignment is done manually by the DSMZ data curator in close cooperation with data provider.


 * GFBio IDs according to GFBio consensus documents
 * All GFBio IDs as well as other external IDs on dataset and data unit level so far available (e.g. DOIs, GenBank accession numbers, BOLD numbers, MycoBank numbers, ORCID IDs, GFBio submission IDs, DSMZ strain numbers, IDs provided by other GFBio data centers for linked datasets) are stored in appropriate tables of the MySQL Server installations. As far as they are part of GFBio consensus documents they will be published.


 * Occurrence data according to GFBio WP5 consensus document
 * The occurrence data are stored at two levels and two granularities, (a) at dataset level and (b) at unit level.


 * Other (meta)data
 * Multimediadata and other metadata are stored in our MySQL Server system.

Transformation of DIPs for DSMZ archiving system

 * Archiving of DIPs
 * All DIPs are created as zipped ABCD 2.06 or ABCD 2.1 xml archives using a regular manual function of the BioCASe Provider Software. A backup of the stored DIPs is done on a daily basis by DSMZ IT according to Technical documentation of long-term archiving solutions at the GFBio collection data centers

Transformation of DIPs for publication in GFBio data portal and VAT tool

 * Access via BioCASe Local Query Tool, Landingpage
 * In general the DSMZ Datasources are accessible via the BioCASe Monitor Service. This includes the access to landingpages and local DSMZ biocase query tools.


 * Access via BioCASe Monitor service (BMS)
 * see https://gfbio.biowikifarm.net/wiki/Data_Publishing/General_part:_GFBio_publication_of_type_1_data_via_BioCASe_data_pipelines


 * Citation of published dataset
 * The proposed citation string is given according to the scheme examplified above, for details see https://gfbio.biowikifarm.net/wiki/Data_Publishing/General_part:_GFBio_publication_of_type_1_data_via_BioCASe_data_pipelines


 * Indexing/harvesting by central GFBio indexing processes
 * see General part: GFBio publication of type 1 data via BioCASe data pipelines


 * Access via GFBio Data Portal
 * see General part: GFBio publication of type 1 data via BioCASe data pipelines