Technical documentation of long-term archiving solutions at the GFBio collection data centers

From GFBio Public Wiki
Jump to: navigation, search

Technical documentation of long-term archiving solutions at the GFBio collection data centers

This documentation address the current practice of (a) backup/archival storage, (b) long-term archiving solutions and (c) implemented standard archiving procedures for various information packages at the individual GFBio collection data centers.


The long-term archiving solutions at the GFBio data centers are in general designed according the requirements of the reference model and concepts of the Open Archival Information System (OAIS) (see OAIS terminology) with the goal to become compliant with ISO Standard 14721.

OAIS Functional Entities of GFBio Collection Data Centers/Archives - Overview
OAIS Entities overlayed with Software for Data Management and Exchange Standards


Following this, we distinguish:

  • SIP= Submission information packages: archiving of original incoming files and data objects (binary data and text-based data) with documentation in detail (e. g. xls, jpg) - archival storage with timestamp
  • AIP= Archival information packages, e. g. renamed and newly organised information packages (binary data and text-based data) following the internal data structure (RDMS, multimedia data storage; see Relational database management systems under Technical Documentations) and information organisation of the single archives - periodical archival storage with timestamp
  • DIP= Dissemination information packages (binary data and text-based data) deviate substantially from SIP and AIP concerning, data structure, format and data content. DIPs (e. g. ABCD xml archives; SDD xml archives) have to be archived separately - (periodical) archival storage with timestamp



In parallel, the GFBio collection data centers documented their installations of collection management systems, management systems not specialised on collection data, multimedia data management systems and of GFBio related IT services, tools and databases at the data centers. Futhermore, please consult Concepts and Standards and ISO Standards for Digital Archives.


GFBio data center/ data archive BGBM DSMZ MfN SGN SMNS SNSB ZFMK
Contact persons Name Stefan Liesch Adam Podstawka MfN IT department Senckenberg IT Markus Grabert, Dieter Hagmann Stefan Seifert Peter Grobe
Phone 030 838 55217 0531 2616 380 069 7542 1366 0711 8936 284 / 0711 8936 2979 089 17861 245 0228 9122 342
E-Mail it@bgbm.org adam.podstawka@dsmz.de ; netadmins@dsmz.de it@mfn-berlin.de Lothar.Menner@senckenberg.de markus.grabert@smns-bw.de; dieter.hagmann@smns-bw.de seifert@snsb.de p.grobe@leibniz-zfmk.de
Backup Name of backup system Acronis Backup Advanced Bareos Community Edition customized scripts, Hyper-V Backup CommVault Simpana Robocopy scripts, run by scheduler own; IBM Tivoli HP EML
Method of backup (incremental/differential/complete/virtual full) daily incremental, weekly full daily incremental, weekly differential, monthly full, partial "virtual full" depends on storage system: daily incremental or full (snapshot), weekly full daily incremental, weekly differential, monthly full, partial "virtual full" daily full; daily incremental & complete yearly hourly incremental, daily full daily incremental, full every 15 weeks
Time interval daily, weekly daily, weekly and monthly daily, weekly daily, weekly and monthly h, d, m, y h,d,m,Y d,w
Time schedule starting at 22:00 and 06:00 starting at 18:00 and 20:00 starting at 5:00 am between 21:00 and 24:00 h starting at 08:00 08:00-19:00 h; 0:00-4:00 d,m,Y 22:00-07:00
Disaster recovery possible yes yes yes yes yes yes yes
Disaster recovery tested yes yes partly yes yes yes yes
Recovery procedure is documented on paper, including necessary passwords partly partly no no partly no yes
Maximum timespan not included in backups/till last backup 24h 24 48h 24h 24h 10h 20h
Multiple backup versions available yes, minimum 3 yes, minimum 3 yes yes, minimum 3 yes, minimum 10 yes, minimum 3 3 generations of full backups (~1 year)
Planned time for complete restore variable variable (minutes-days) variable (minutes-days) variable variable from some minutes to some days! depends
Archiving general information Revision safe (archive is not overwritable or a version control is used) yes manually written to WORM (yes) yes yes yes yes (Tivoli) yes (protection time)
ISO Standard 14721 compliant; see ISO Standards for Digital Archives not certified not certified, but based  ? not certified  ? not certified  ?
Archiving concept applies Open Archival Information System (OAIS) recommendations (yes, no, partially complied) no partially complied  ? partially complied  ? partially complied  ?
Retention period (duration of usability of data) >10y >10y 10y >=10y >10y 10y (continuous rebuild on new media in tape-lib) >10y
Archiving of system internal data Database Database structure (format) sql-dump,xml-dump, Arconis backup-container sql-dump, xml-dump sql dumps and VM snapshots sql dump in documentation; xml-dump (automated) in documentation; xml-dump (automated) in documentation; sql-dump
Database data (format) sql-dump, xml-dump, Arconis backup-container sql-dump, xml-dump sql dumps and VM snapshots sql-dump xml-dump (automated) xml-dump (automated) text dump, sql-dump
Database metadata (DB logics, conventions not guaranteed by db structure; sql-dump does not give this information) in documentation in documentation VM snapshot, documentation in documentation in documentation in documentation in documentation
Database documented and archived yes yes yes yes yes yes yes
Proprietary (closed source) tools are necessary for reconstruction (besides archiving software) open source, partial: for MS SQL Server open source yes open source yes, DB-Server, Windows, C#. But migration to other systems hard but possible. yes, DB-Server, Windows, C#. But migration to other systems hard but possible. partial: for MS SQL Server
New revisions every (year/day/hour) continuously continuously continuously year year
Reconstruction from archive was tested yes yes partly yes partly partly yes
Multimedia/binary-data Accepted formats various various various various various various 2d, 3d, 4d, sound, movie, text, etc.
Formats are completely documented standard formats are used standard formats are used depends standard formats are used yes, open source and free-ware tools for handling are available yes, open source and free-ware tools for handling are available standard formats are used
Metadata is archived yes, in db yes, in database yes yes, in db yes, in db yes, in database yes, in db
Reconstruction documentation is available and archived partly partly yes yes yes yes
Reconstruction from archive is tested yes yes yes yes yes yes
Archiving of RAW-data / ingest RAW-data (original data) is archived yes yes yes yes yes yes yes
Accepted RAW-data formats to be defined to be defined all formats, but dependent from file size all all to be defined all
Processing of RAW-data is documented and archived not yet no no no not yet yes documentation is in progress
Metadata of RAW-data is in archive partly, in db partly, in database yes yes partly, in db yes, in database yes, in db or accompanying textfiles
Data Hard-/Software encryption -/password none used none none none none none
Hard-/Software compression -/yes Hardware compression deduplication -/yes -/yes -/yes
Media Type of data storage devices LTO5 LTO5 & LTO6 Dell Storage and tape library LTO6 HDD mirror of live storage arrays, backup/archive to tape-lib LTO5/6
Auxiliary storage devices harddisc harddisc HDD partially HDD HDD - partially HDD
Duration of media keeping 3y >3 Month (>10y WORM) 10y 5y 8y 5y ~5y (depends on HP)
Place of media keeping two separated server rooms one copy in tapelib, one copy in office distributed servers two separated server rooms, place for second copy in progress two separated server rooms two separated server rooms, tape-lib at backup-provider at tape library, place for second copy in progress
Supports WORM (write once read many) media yes yes yes yes no no yes
Count of drives 2 2 & 5 not applicable 2 3 not applicable 4
Capacity [TB/Slots] 150 TB >100 & >700 40 TB 150 TB, will be upgraded to 0.5 PB 30 TB local some live TB, tape-lib some PB 0.5 TB
Documentation Available documentation about the backup/archiving system (e.g. URL to pdf- or html-files) Acronis Backup Advanced Bareos Manual Simpana documentation admin archive Tivoli Storage Manager documentation, Multimedia processing documentation, admin documentation (html, internal) HP Data Protection Website
Computing center, external service provider name of the associated computing center(s), (commercial) service provider(s) and services provided Computing Service of Freie Universität Berlin (ZEDAT). Provision of servers for science and administration: backup and archiving, storage, databases ?? (future collaboration with a Berlin based computing center is being planned) Computing Center is run by the Senckenberg-IT services. Additional facilities for Scientific Computing are provided by the Senckenberg Data and Modelling Center and (large scale) in cooperation with the Frankfurt University's Center for Scientific Computing (CSC).  ?? Leibniz-Rechenzentrum (LRZ) with services of the Münchner Wissenschaftsnetz, network services, archiving and backup system (ABS) Computing center is run by ZFMK, cooperations exist with Jülich Supercomputing Centre and University of Cologne regarding computing

Status: October 2018


Back to Technical Documentations