Publication of Type 1 Data via BioCASe Data Pipelines at MfN Data Center

From GFBio Public Wiki
Revision as of 09:46, 11 May 2020 by Dagmar Triebel (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The MfN Data Center is one of the seven GFBio Collection Data Centers which are components of the GFBio Submission, Repository and Archiving Infrastructure. The data management, archiving and publication is as described under Technical Documentations. The data are structured according the ABCD conceptual schema.

The workflow is illustrated in figure 1 and described in the text below.
Figure 1: The MfN Workflow, BioCASe (Biological Collection Access Service) data pipelines. Clicking will enlarge the chart.

Data pipeline

Export of GFBio DIP from in-house-management system

The data package's relevant metadata is exported and joined with a static snapshot of the submitted data. A DOI and version number is assigned to the data package. Multimedia files and documents are added as permanent URL from the Digital Asset Management System. So, the datapackage consists of three components: raw data, linked data and metadata. The additional metadata regarding the version, citition recommendation, license and identifier complements the metadata set. This joined package is mapped in the BioCASe Provider Software (BPS) and exported as ABCD archive.

Transformation of DIP for publication in GFBio data portal

For each data package a DOI landing page is autmatically generated by the metadata component. It is already available even if the dataset has an embargo date. The landing page will indicate the (estimated) date of data publication. The datapackage will be internally checked for data quality. As the (public) access to the ABCD data package is only possible as soon as the embargo date is due, the final quality check according to the GFBio requirements are conducted just before the GFBio publication. The ABCD archive will then be accessible for harvesting by the central GFBio infrastructures in order to access via the data GFBio Data Portal. Other portals like GBIF will be contacted in order to harvest the data packages directly from the BPS .

Transformation of DIP for archiving system

The ABCD archive is the DIP and are added to the Digital Asset Management System for arching. Thus, the AIP(s) consist of a static version of the DIP, the raw data / files and derived (open) formats of the data.

See also General part: GFBio publication of type 1 data via BioCASe data pipelines.