Publication of Type 1 Data via BioCASe Data Pipelines at SGN Data Center
The SGN Data Center is one of the seven GFBio Collection Data Centers, which are core components of the GFBio Submission, Repository and Archiving Infrastructure. The data archiving and publication at SGN is mainly based on the in-house developed database systems SeSam and AQUiLA.
- The workflow with these central components is described in figure 1 and the text below.
Export of GFBio DIPs from SGN in-house-management system
- The citation is according to the GFBio citation pattern. If needed, it will be edited in close collaboration with data provider.
- Example: Janssen, R. (2016). Digitalisierung und taxonomische Überarbeitung der Unionida-Sammlung des Senckenberg Forschungsinstituts und Naturmuseums Frankfurt am Main. Senckenbergianum Frankfurt. [Dataset]. Data Publisher: Senckenberg Gesellschaft für Naturforschung – Leibniz Institute, Frankfurt. http://www.senckenberg.de/root/index.php?page_id=297"
- The licenses for the data packages are ingested in SeSam/AQUiLA during the submission/ingestion process.
- The licenses for multimedia objects are stored in SeSam/AQUiLA together with multimedia URLs. GFBio is promoting CC licenses, SGN favorite license is CC BY-SA 4.0.
- GFBio data and metadata created during submission
- The metadata which are generated during GFBio submission are processed via JIRA ticket system and ingested in SeSam/AQUiLA (work in progress). Additional metadata and original research data are imported in SeSam/AQUiLA. Additional parameter assignment is done manually by the data producers in cooperation with SGN data curator.
- GFBio IDs according to GFBio consensus documents
- All GFBio IDs as well as other external IDs as far as available are stored in SeSam/AQUiLA.
- Occurrence data according to GFBio consensus documents
- The occurrence data are stored at two levels and two granularities, (a) at dataset level and (b) at unit level in SeSam/AQUiLA.
- Other (meta)data
- Multimedia and morphological data are stored in SeSam/AQUiLA.
- Other (meta)data recommended or mandatory for export are stored in SeSam/AQUiLA.
Transformation of DIPs for SGN archiving system
- Archiving of DIPs --> AIPs
- All DIPs are created as zipped ABCD 2.06 xml archives using a regular manual function of the BioCASe Provider Software. A backup of the stored DIPs is done on a daily basis by Senckenberg IT according to Technical documentation of long-term archiving solutions at the GFBio collection data centers
Transformation of DIPs for publication in GFBio data portal and VAT tool
- using BioCASe Provider Software
- Access to the SGN datasources is provided via BioCASe Monitor Service (BMS).
- This includes links to the BioCASe Local Query Tool and the Consistency Check regarding ABCD Consensus elements how they are agreed in GFBio.
- Indexing/harvesting for access via GFBio Data Portal and VAT-System
- After all quality checks the final data package is announced to the central GFBio indexing/harvesting process and can finally be accessed via GFBio data portal.
- For georeferenced data, an import to the VAT-System is provided by the data portal.