Publication of Type 1 Data via BioCASe Data Pipelines at GFBio Data Centers

From GFBio Public Wiki
(Redirected from BioCASe Pipeline)
Jump to: navigation, search

Publication of type 1 data via BioCASe data pipelines

The seven GFBio collection data centers set up various solutions for publication of biodiversity data structured according the ABCD standard. They have various approaches for delivering landing pages on dataset levels and data unit levels and various solutions for presentation of the dissemination packages and their versioning as far as datasets which change their content over the time ("dynamic") are concerned. They can have just one data pipeline for type 1 data (type 1 data are defined here) but also two or more. They may, for example, operate different data pipelines for (meta-)data with and without stored specimen objects or data with and without assigned multimedia objects.

Nevertheless all data centers agree on several aspects:

1. They deliver DIPs using ABCD 2.06 standard elements and the latest version of the BioCASe Provider Software.

2. They use a set of recommended and mandatory ABCD consensus elements (page includes examples). The mandatory elements must not be empty!

3. They follow a Citation convention (page includes examples).

4. They follow a convention on recommended licenses and their standard abbreviations.

5. They use predefined content lists following GBIF rules for one mandatory and one highly recommended ABCD element, i.e. RecordBasis and Kingdom. Both term lists and definitions are included as controlled vocabularies in the GFBio TS (Terminology Service).

6. They use the GFBio instance of the BioCASe Monitor Service for consistency check and common machine-readable web service presentation.

7. The data and metadata are transformed to PanSimpleDC (see under https://github.com/gfbio/abcd-to-dublin-core, https://github.com/gfbio/ABCD_XSLT_Landingpages) and provided for indexing/harvesting by central GFBio indexing processes.

8. Ideally datasets contain georeferenced records (WGS 84 coordinates) as recommended by the ABCD for data publication and can subsequently be harvested/ingested by the GFBio VAT-System for further analysis/modelling.





Notes:

The scientific data curators of the GFBio data centers (= data repositories) have write and read access to the GFBio submission system, read access to the GFBio Terminology Service and as far as SNSB, SMNS and ZFMK are concerned read access to the DWB cloud services. They use these services for parameter assignment.

All GFBio collection data centers are GBIF data publishers. By that, they might offer an additional service for data producers to support their parallel dynamic publication of occurrence and checklist data via this international biodiversity data network (with GBIF DOI assignment).

All GFBio collection data centers run their own technical solution and data pipelines for long-term data archiving sensu OAIS requirements.