Publication of Type 1 Data via BioCASe Data Pipelines at SGN Data Center

The SGN Data Center is one of the 7 GFBio Collection Data Centers, which are core components of the GFBio Submission, Repository and Archiving Infrastructure. The data archiving and publication at SGN is mainly based on the in-house developed database systems SeSam and AQUiLA.
 * The workflow with these central components is described in figure 1 and the text below.



Export of GFBio DIPs from SGN in-house-management system

 * Citation
 * The citation is according to the GFBio citation pattern. If needed, it will be edited in close collaboration with data provider.
 * Example: Janssen, R. (2016). Digitalisierung und taxonomische Überarbeitung der Unionida-Sammlung des Senckenberg Forschungsinstituts und Naturmuseums Frankfurt am Main. Senckenbergianum Frankfurt. [Dataset]. Data Publisher: Senckenberg Gesellschaft für Naturforschung – Leibniz Institute, Frankfurt. http://www.senckenberg.de/root/index.php?page_id=297"


 * Licenses
 * The licenses for the data packages are ingested in SeSam/AQUiLA during the submission/ingestion process.
 * The licenses for multimedia objects are stored in SeSam/AQUiLA together with multimedia URLs. GFBio is promoting CC licenses, SGN favorite license is CC BY-SA 4.0.


 * GFBio data and metadata created during submission


 * The metadata which are generated during GFBio submission are processed via JIRA ticket system and ingested in SeSam/AQUiLA (work in progress). Additional metadata and original research data are imported in SeSam/AQUiLA. Additional parameter assignment is done manually by the data producers in cooperation with SGN data curator.


 * GFBio IDs according to GFBio consensus documents


 * All GFBio IDs as well as other external IDs as far as available are stored in SeSam/AQUiLA.


 * Occurrence data according to GFBio WP5 consensus document


 * The occurrence data are stored at two levels and two granularities, (a) at dataset level and (b) at unit level in SeSam/AQUiLA.


 * Other (meta)data
 * Multimedia and morphological data are stored in SeSam/AQUiLA.
 * Other (meta)data recommended or mandatory for export are stored in SeSam/AQUiLA.

Transformation of DIPs for SGN archiving system

 * Archiving of DIPs
 * All DIPs are created as zipped ABCD 2.06 xml archives using a regular manual function of the BioCASe Provider Software. A backup of the stored DIPs is done on a daily basis by Senckenberg IT according to Technical documentation of long-term archiving solutions at the GFBio collection data centers

Transformation of DIPs for publication in GFBio data portal and VAT tool

 * using BioCASe Provider Software
 * Access to the SGN datasources is provided via BioCASe Monitor Service (BMS).
 * This includes links to the BioCASe Local Query Tool and the Consistency Check regarding ABCD Consensus elements how they are agreed in GFBio.


 * Indexing/harvesting for access via GFBio Data Portal and VAT-System
 * After all quality checks the final data package is announced to the central GFBio indexing/harvesting process and can finally be accessed via GFBio data portal.
 * For georeferenced data, an import to the VAT-System is provided by the data portal.

See also General part: GFBio publication of type 1 data via BioCASe data pipelines.