Publication of Type 1 Data via BioCASe Data Pipelines at SGN Data Center

From GFBio Public Wiki
Jump to: navigation, search

The SGN Data Center is one of the 7 GFBio Collection Data Centers, which are core components of the GFBio Submission, Repository and Archiving Infrastructure. The data archiving and publication at SGN is mainly based on the in-house developed database systems SeSam and AQUiLA.

The workflow with these central components is described in figure 1 and the text below.
Figure 1: The SGN Workflow, BioCASe data pipelines for GFBio Type 1 Data.
ABCD - Access to Biological Collections Data schema (V2.06 within GFBio)
SIP - Submission Information Package
AIP - Archival Information Package
DIP - Dissemination Information Package
VAT - Visualization, Aggregation and Transformation


Data pipeline

Export of GFBio DIPs from SGN in-house-management system

  • Citation
The citation is according to the GFBio citation pattern. If needed, it will be edited in close collaboration with data provider.
Example: Janssen, R. (2016). Digitalisierung und taxonomische Überarbeitung der Unionida-Sammlung des Senckenberg Forschungsinstituts und Naturmuseums Frankfurt am Main. Senckenbergianum Frankfurt. [Dataset]. Data Publisher: Senckenberg Gesellschaft für Naturforschung – Leibniz Institute, Frankfurt. http://www.senckenberg.de/root/index.php?page_id=297"
  • Licenses
The licenses for the data packages are ingested in SeSam/AQUiLA during the submission/ingestion process.
The licenses for multimedia objects are stored in SeSam/AQUiLA together with multimedia URLs. GFBio is promoting CC licenses, SGN favorite license is CC BY-SA 4.0.
  • GFBio data and metadata created during submission
The metadata which are generated during GFBio submission are processed via JIRA ticket system and ingested in SeSam/AQUiLA (work in progress). Additional metadata and original research data are imported in SeSam/AQUiLA. Additional parameter assignment is done manually by the data producers in cooperation with SGN data curator.
All GFBio IDs as well as other external IDs as far as available are stored in SeSam/AQUiLA.
The occurrence data are stored at two levels and two granularities, (a) at dataset level and (b) at unit level in SeSam/AQUiLA.
  • Other (meta)data
Multimedia and morphological data are stored in SeSam/AQUiLA.
Other (meta)data recommended or mandatory for export are stored in SeSam/AQUiLA.

Transformation of DIPs for SGN archiving system

  • Archiving of DIPs
All DIPs are created as zipped ABCD 2.06 xml archives using a regular manual function of the BioCASe Provider Software. A backup of the stored DIPs is done on a daily basis by Senckenberg IT according to Technical documentation of long-term archiving solutions at the GFBio collection data centers

Transformation of DIPs for publication in GFBio data portal and VAT tool

  • using BioCASe Provider Software
Access to the SGN datasources is provided via BioCASe Monitor Service (BMS).
This includes links to the BioCASe Local Query Tool and the Consistency Check regarding ABCD Consensus elements how they are agreed in GFBio.
  • Indexing/harvesting for access via GFBio Data Portal and VAT-System
After all quality checks the final data package is announced to the central GFBio indexing/harvesting process and can finally be accessed via GFBio data portal.
For georeferenced data, an import to the VAT-System is provided by the data portal.

See also General part: GFBio publication of type 1 data via BioCASe data pipelines.