Technical documentation of long-term archiving solutions at the GFBio collection data centers
Technical documentation of long-term archiving solutions at the GFBio collection data centers
This documentation address the current practice of (a) backup/archival storage, (b) long-term archiving solutions and (c) implemented standard archiving procedures for various information packages at the individual GFBio collection data centers.
The long-term archiving solutions at the GFBio data centers are in general designed according the requirements of the reference model and concepts of the Open Archival Information System (OAIS) (see OAIS terminology) with the goal to become compliant with ISO Standard 14721.
Following this, we distinguish:
- SIP= Submission information packages: archiving of original incoming files and data objects (binary data and text-based data) with documentation in detail (e. g. xls, jpg) - archival storage with timestamp
- AIP= Archival information packages, e. g. renamed and newly organised information packages (binary data and text-based data) following the internal data structure (RDMS, multimedia data storage; see Relational database management systems under Technical Documentations) and information organisation of the single archives - periodical archival storage with timestamp
- DIP= Dissemination information packages (binary data and text-based data) deviate substantially from SIP and AIP concerning, data structure, format and data content. DIPs (e. g. ABCD xml archives; SDD xml archives) have to be archived separately - (periodical) archival storage with timestamp
In parallel, the GFBio collection data centers documented their installations of collection management systems, management systems not specialised on collection data, multimedia data management systems and of GFBio related IT services, tools and databases at the data centers. Futhermore, please consult Concepts and Standards and ISO Standards for Digital Archives.
GFBio data center/ data archive | BGBM | DSMZ | MfN | SGN | SMNS | SNSB | ZFMK | ||
---|---|---|---|---|---|---|---|---|---|
Contact persons | Name | Dominik Röpert | Adam Podstawka | Falko Glöckler | Senckenberg IT | Angela Jandl, Dieter Hagmann | Stefan Seifert | Peter Grobe | |
Phone | 030 838 50172 | 0531 2616 380 | +49 30 889140 8672 | 069 7542 1366 | 0711 8936 287 / 0711 8936 2979 | 089 17861 245 | 0228 9122 342 | ||
it@bgbm.org | adam.podstawka@dsmz.de ; netadmins@dsmz.de | falko.gloeckler@mfn.berlin | Lothar.Menner@senckenberg.de | Angela.jandl@smns-bw.de; dieter.hagmann@smns-bw.de | seifert@snsb.de | p.grobe@leibniz-zfmk.de | |||
Backup | Name of backup system | Acronis Backup Advanced & NetBackup | Bareos Community Edition | customized scripts, Hyper-V Backup | CommVault Simpana | Robocopy scripts, run by scheduler | own; IBM Tivoli | HP EML | |
Method of backup (incremental/differential/complete/virtual full) | daily incremental, weekly full | daily incremental, weekly differential, monthly full, partial "virtual full" | depends on storage system: daily incremental or full (snapshot), weekly full | daily incremental, weekly differential, monthly full, partial "virtual full" | daily full; daily incremental & complete yearly | hourly incremental, daily full | daily incremental, full every 15 weeks | ||
Time interval | daily, weekly | daily, weekly and monthly | daily, weekly | daily, weekly and monthly | h, d, m, y | h,d,m,Y | d,w | ||
Time schedule | starting at 22:00 and 06:00 | starting at 18:00 and 20:00 | starting at 5:00 am | between 21:00 and 24:00 h | starting at 08:00 | 08:00-19:00 h; 0:00-4:00 d,m,Y | 22:00-07:00 | ||
Disaster recovery possible | yes | yes | yes | yes | yes | yes | yes | ||
Disaster recovery tested | yes | yes | partly | yes | yes | yes | yes | ||
Recovery procedure is documented on paper, including necessary passwords | partly | partly | no | no | partly | partly | yes | ||
Maximum timespan not included in backups/till last backup | 24h | 24 | 48h | 24h | 24h | 10h | 20h | ||
Multiple backup versions available | yes, minimum 3 | yes, minimum 3 | yes | yes, minimum 3 | yes, minimum 10 | yes, minimum 3 | 3 generations of full backups (~1 year) | ||
Planned time for complete restore | variable | variable (minutes-days) | variable (minutes-days) | variable | variable | from some minutes to some days! | depends | ||
Archiving general information | Revision safe (archive is not overwritable or a version control is used) | yes | manually written to WORM (yes) | yes | yes | yes | yes (Tivoli) | yes (protection time) | |
ISO Standard 14721 compliant; see ISO Standards for Digital Archives | not certified | not certified, but based | not certified | not certified | ? | not ISO certified, but compliant to OAIS | compliant to OAIS | ||
Archiving concept applies Open Archival Information System (OAIS) recommendations (yes, no, partially complied) | partially complied | partially complied | partially complied | partially complied | ? | yes | yes | ||
Retention period (duration of usability of data) | >10y | >10y | 10y | >=10y | >10y | 10y (continuous rebuild on new media in tape-lib) | >10y | ||
Archiving of system internal data | Database | Database structure (format) | sql-dump, xml-dump, Arconis backup-container | sql-dump, xml-dump | sql dumps and VM snapshots | sql dump | in documentation; xml-dump (automated) | in documentation; sql-dump, xml-dump (automated) | in documentation; sql-dump |
Database data (format) | sql-dump, xml-dump, Arconis backup-container | sql-dump, xml-dump | sql-dumps and VM snapshots | sql-dump | xml-dump (automated) | sql-dump, xml-dump (automated) | text dump, sql-dump | ||
Database metadata (DB logics, conventions not guaranteed by db structure; sql-dump does not give this information) | in documentation | in documentation | VM snapshot, documentation | in documentation | in documentation | in documentation | in documentation | ||
Database documented and archived | yes | yes | yes | yes | yes | yes | yes | ||
Proprietary (closed source) tools are necessary for reconstruction (besides archiving software) | open source, partial: for MS SQL Server | open source | yes | open source | yes, DB-Server, Windows, C#. But migration to other systems hard but possible. | yes, DB-Server, Windows, C#. But migration to other systems hard but possible. | partial: for MS SQL Server | ||
New revisions every (year/day/hour) | continuously | continuously | continuously | year | year | ||||
Reconstruction from archive was tested | yes | yes | partly | yes | partly | partly | yes | ||
Multimedia/binary-data | Accepted formats | various | various | various | various | various | various | 2d, 3d, 4d, sound, movie, text, etc. | |
Formats are completely documented | standard formats are used | standard formats are used | depends | standard formats are used | yes, open source and free-ware tools for handling are available | yes, open source and free-ware tools for handling are available | standard formats are used | ||
Metadata is archived | yes, in db | yes, in db | yes | yes, in db | yes, in db | yes, in db | yes, in db | ||
Reconstruction documentation is available and archived | partly | partly | yes | yes | yes | yes | |||
Reconstruction from archive is tested | yes | yes | yes | yes | yes | yes | |||
Archiving of RAW-data / ingest | RAW-data (original data) is archived | yes | yes | yes | yes | yes | yes, dependent from file size and as far as well documented by data provider; bit-stream preservation only | yes | |
Accepted RAW-data formats | all | to be defined | all formats, but dependent from file size | all | all | all, as far as well documented by data provider | all | ||
Processing of RAW-data is documented and archived | documentation is work in progress | no | not yet | not yet | not yet | documentation is work in progress | documentation is work in progress | ||
Metadata of RAW-data is in archive | partly, in db | partly, in db | yes | yes | partly, in db | yes, in db | yes, in db or accompanying textfiles | ||
Data | Hard-/Software encryption | -/password | none used | none | none | none | none | none | |
Hard-/Software compression | -/yes | Hardware compression | deduplication | -/yes | -/yes | -/yes | |||
Media | Type of data storage devices | LTO6 / LTO7 | LTO5 & LTO6 | Dell Storage and tape library | LTO6 | HDD | mirror of live storage arrays, backup/archive to tape library | LTO5/6 | |
Auxiliary storage devices | harddisc | harddisc | HDD | partially HDD | HDD | - | partially HDD | ||
Duration of media keeping | ~5y (depends on HP) | >3 Month (>10y WORM) | 10y | 5y | 8y | 5y | ~5y (depends on HP) | ||
Place of media keeping | two separated server rooms | one copy in tape library, one copy in office | distributed servers | two separated server rooms, place for second copy in progress | two separated server rooms | two separated server rooms, tape library at LRZ backup provider | tape library, place for second copy in progress | ||
Supports WORM (write once read many) media | yes | yes | yes | yes | no | no | yes | ||
Count of drives | 2 | 2 & 5 | not applicable | 2 | 3 | not applicable | 4 | ||
Capacity [TB/Slots] | 150 TB | >100 & >700 | >100 TB | 150 TB, will be upgraded to 0.5 PB | 30 TB | local some live TB, tape library some PB | 0.5 TB | ||
Documentation | Available documentation about the backup/archiving system (e.g. URL to pdf- or html-files) | Acronis Backup Advanced, NetBackup | Bareos Manual | - | Simpana documentation | admin archive | Tivoli Storage Manager documentation, Multimedia processing documentation, admin documentation (html, internal) | HP Data Protection Website | |
Computing center, external service provider | name of the associated computing center(s), (commercial) service provider(s) and services provided | Computing Service of Freie Universität Berlin (ZEDAT). Provision of servers for science and administration: backup and archiving, storage, databases | ?? | (future collaboration with a Berlin based computing center is being planned) | Computing Center is run by the Senckenberg-IT services. Additional facilities for Scientific Computing are provided by the Senckenberg Data and Modelling Center and (large scale) in cooperation with the Frankfurt University's Center for Scientific Computing (CSC). | Cooperation with the BELWUE computing center for scientific institutions and universities in Baden-Württemberg with network services | Leibniz-Rechenzentrum (LRZ) services, e.g., those of the Münchner Wissenschaftsnetz, network services, archiving and backup system (ABS). The LRZ is certified for IT service management (ISO/IEC 20000) and for information security (ISO/IEC 27001). | Computing center is run by ZFMK. Cooperation exists with the computing center of the University of Bonn. |
Status: November 2020
Back to Technical Documentations