Technical documentation of long-term archiving solutions at the GFBio collection data centers

From GFBio Public Wiki
Revision as of 16:42, 24 June 2021 by Anke Penzlin (Talk | contribs) (URLs in table header adjusted to current gfbio homepage)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Technical documentation of long-term archiving solutions at the GFBio collection data centers

This documentation address the current practice of (a) backup/archival storage, (b) long-term archiving solutions and (c) implemented standard archiving procedures for various information packages at the individual GFBio collection data centers.


The long-term archiving solutions at the GFBio data centers are in general designed according the requirements of the reference model and concepts of the Open Archival Information System (OAIS) (see OAIS terminology) with the goal to become compliant with ISO Standard 14721.

OAIS Functional Entities of GFBio Collection Data Centers/Archives - Overview
OAIS Entities overlayed with Software for Data Management and Exchange Standards


Following this, we distinguish:

  • SIP= Submission information packages: archiving of original incoming files and data objects (binary data and text-based data) with documentation in detail (e. g. xls, jpg) - archival storage with timestamp
  • AIP= Archival information packages, e. g. renamed and newly organised information packages (binary data and text-based data) following the internal data structure (RDMS, multimedia data storage; see Relational database management systems under Technical Documentations) and information organisation of the single archives - periodical archival storage with timestamp
  • DIP= Dissemination information packages (binary data and text-based data) deviate substantially from SIP and AIP concerning, data structure, format and data content. DIPs (e. g. ABCD xml archives; SDD xml archives) have to be archived separately - (periodical) archival storage with timestamp



In parallel, the GFBio collection data centers documented their installations of collection management systems, management systems not specialised on collection data, multimedia data management systems and of GFBio related IT services, tools and databases at the data centers. Futhermore, please consult Concepts and Standards and ISO Standards for Digital Archives.


GFBio data center/ data archive BGBM DSMZ MfN SGN SMNS SNSB ZFMK
Contact persons Name Dominik Röpert Adam Podstawka Falko Glöckler Senckenberg IT Angela Jandl, Dieter Hagmann Stefan Seifert Peter Grobe
Phone 030 838 50172 0531 2616 380 +49 30 889140 8672 069 7542 1366 0711 8936 287 / 0711 8936 2979 089 17861 245 0228 9122 342
E-Mail it@bgbm.org adam.podstawka@dsmz.de ; netadmins@dsmz.de falko.gloeckler@mfn.berlin Lothar.Menner@senckenberg.de Angela.jandl@smns-bw.de; dieter.hagmann@smns-bw.de seifert@snsb.de p.grobe@leibniz-zfmk.de
Backup Name of backup system Acronis Backup Advanced & NetBackup Bareos Community Edition customized scripts, Hyper-V Backup CommVault Simpana Robocopy scripts, run by scheduler own; IBM Tivoli HP EML
Method of backup (incremental/differential/complete/virtual full) daily incremental, weekly full daily incremental, weekly differential, monthly full, partial "virtual full" depends on storage system: daily incremental or full (snapshot), weekly full daily incremental, weekly differential, monthly full, partial "virtual full" daily full; daily incremental & complete yearly hourly incremental, daily full daily incremental, full every 15 weeks
Time interval daily, weekly daily, weekly and monthly daily, weekly daily, weekly and monthly h, d, m, y h,d,m,Y d,w
Time schedule starting at 22:00 and 06:00 starting at 18:00 and 20:00 starting at 5:00 am between 21:00 and 24:00 h starting at 08:00 08:00-19:00 h; 0:00-4:00 d,m,Y 22:00-07:00
Disaster recovery possible yes yes yes yes yes yes yes
Disaster recovery tested yes yes partly yes yes yes yes
Recovery procedure is documented on paper, including necessary passwords partly partly no no partly partly yes
Maximum timespan not included in backups/till last backup 24h 24 48h 24h 24h 10h 20h
Multiple backup versions available yes, minimum 3 yes, minimum 3 yes yes, minimum 3 yes, minimum 10 yes, minimum 3 3 generations of full backups (~1 year)
Planned time for complete restore variable variable (minutes-days) variable (minutes-days) variable variable from some minutes to some days! depends
Archiving general information Revision safe (archive is not overwritable or a version control is used) yes manually written to WORM (yes) yes yes yes yes (Tivoli) yes (protection time)
ISO Standard 14721 compliant; see ISO Standards for Digital Archives not certified not certified, but based not certified not certified  ? not ISO certified, but compliant to OAIS compliant to OAIS
Archiving concept applies Open Archival Information System (OAIS) recommendations (yes, no, partially complied) partially complied partially complied partially complied partially complied  ? yes yes
Retention period (duration of usability of data) >10y >10y 10y >=10y >10y 10y (continuous rebuild on new media in tape-lib) >10y
Archiving of system internal data Database Database structure (format) sql-dump, xml-dump, Arconis backup-container sql-dump, xml-dump sql dumps and VM snapshots sql dump in documentation; xml-dump (automated) in documentation; sql-dump, xml-dump (automated) in documentation; sql-dump
Database data (format) sql-dump, xml-dump, Arconis backup-container sql-dump, xml-dump sql-dumps and VM snapshots sql-dump xml-dump (automated) sql-dump, xml-dump (automated) text dump, sql-dump
Database metadata (DB logics, conventions not guaranteed by db structure; sql-dump does not give this information) in documentation in documentation VM snapshot, documentation in documentation in documentation in documentation in documentation
Database documented and archived yes yes yes yes yes yes yes
Proprietary (closed source) tools are necessary for reconstruction (besides archiving software) open source, partial: for MS SQL Server open source yes open source yes, DB-Server, Windows, C#. But migration to other systems hard but possible. yes, DB-Server, Windows, C#. But migration to other systems hard but possible. partial: for MS SQL Server
New revisions every (year/day/hour) continuously continuously continuously year year
Reconstruction from archive was tested yes yes partly yes partly partly yes
Multimedia/binary-data Accepted formats various various various various various various 2d, 3d, 4d, sound, movie, text, etc.
Formats are completely documented standard formats are used standard formats are used depends standard formats are used yes, open source and free-ware tools for handling are available yes, open source and free-ware tools for handling are available standard formats are used
Metadata is archived yes, in db yes, in db yes yes, in db yes, in db yes, in db yes, in db
Reconstruction documentation is available and archived partly partly yes yes yes yes
Reconstruction from archive is tested yes yes yes yes yes yes
Archiving of RAW-data / ingest RAW-data (original data) is archived yes yes yes yes yes yes, dependent from file size and as far as well documented by data provider; bit-stream preservation only yes
Accepted RAW-data formats all to be defined all formats, but dependent from file size all all all, as far as well documented by data provider all
Processing of RAW-data is documented and archived documentation is work in progress no not yet not yet not yet documentation is work in progress documentation is work in progress
Metadata of RAW-data is in archive partly, in db partly, in db yes yes partly, in db yes, in db yes, in db or accompanying textfiles
Data Hard-/Software encryption -/password none used none none none none none
Hard-/Software compression -/yes Hardware compression deduplication -/yes -/yes -/yes
Media Type of data storage devices LTO6 / LTO7 LTO5 & LTO6 Dell Storage and tape library LTO6 HDD mirror of live storage arrays, backup/archive to tape library LTO5/6
Auxiliary storage devices harddisc harddisc HDD partially HDD HDD - partially HDD
Duration of media keeping ~5y (depends on HP) >3 Month (>10y WORM) 10y 5y 8y 5y ~5y (depends on HP)
Place of media keeping two separated server rooms one copy in tape library, one copy in office distributed servers two separated server rooms, place for second copy in progress two separated server rooms two separated server rooms, tape library at LRZ backup provider tape library, place for second copy in progress
Supports WORM (write once read many) media yes yes yes yes no no yes
Count of drives 2 2 & 5 not applicable 2 3 not applicable 4
Capacity [TB/Slots] 150 TB >100 & >700 >100 TB 150 TB, will be upgraded to 0.5 PB 30 TB local some live TB, tape library some PB 0.5 TB
Documentation Available documentation about the backup/archiving system (e.g. URL to pdf- or html-files) Acronis Backup Advanced, NetBackup Bareos Manual - Simpana documentation admin archive Tivoli Storage Manager documentation, Multimedia processing documentation, admin documentation (html, internal) HP Data Protection Website
Computing center, external service provider name of the associated computing center(s), (commercial) service provider(s) and services provided Computing Service of Freie Universität Berlin (ZEDAT). Provision of servers for science and administration: backup and archiving, storage, databases ?? (future collaboration with a Berlin based computing center is being planned) Computing Center is run by the Senckenberg-IT services. Additional facilities for Scientific Computing are provided by the Senckenberg Data and Modelling Center and (large scale) in cooperation with the Frankfurt University's Center for Scientific Computing (CSC). Cooperation with the BELWUE computing center for scientific institutions and universities in Baden-Württemberg with network services Leibniz-Rechenzentrum (LRZ) services, e.g., those of the Münchner Wissenschaftsnetz, network services, archiving and backup system (ABS). The LRZ is certified for IT service management (ISO/IEC 20000) and for information security (ISO/IEC 27001). Computing center is run by ZFMK. Cooperation exists with the computing center of the University of Bonn.

Status: November 2020


Back to Technical Documentations