• Stars
    star
    200
  • Rank 195,325 (Top 4 %)
  • Language
  • License
    Other
  • Created over 5 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

NARA digital preservation file format risk analysis and preservation plans

U.S. National Archives and Records Administration Digital Preservation Framework

The National Archives and Records Administration Digital Preservation Framework consists of a Risk and Prioritization Matrix and File Format Preservation Action Plans.

Background

The National Archives 2022–2026 Strategic Plan embraces the primacy of electronic records. Our vision is to ensure cutting-edge access to extraordinary volumes of government informa­tion and unprecedented engagement to bring greater meaning to the American experience. To do so, NARA must collaborate with other Federal agencies, the private sector, and the public to ensure records and archives thrive in a digital world.

Digital preservation is critical to this work. It becomes even more important because of the direction (M-23-07, Update to Transition to Electronic Records) to Federal agencies to transition business processes and recordkeeping to a fully electronic environment and to end the National Archives’ acceptance of paper records by June 30, 2024.

Our digital preservation subject matter experts, led by Director of Digital Preservation Leslie Johnston, have been hard at work to prepare the National Archives for this change. They have formalized a set of documents that describe how we identify risks to digital files and prioritize them for action, and created specific plans for the preservation of these many file formats.

NARA holds several billion files representing more than 700 file format versions. These files can be categorized into 16 general categories of electronic records. The vast majority of files are email messages, followed by JPEG and TIFF still images and plain ASCII text.

The NARA Risk and Prioritization Matrix

NARA uses the Risk and Prioritization Matrix to measure the preservation risk of digital file formats in our holdings and to assess formats we anticipate receiving in the future. By answering questions related to the ability to preserve and sustain a file format, we identify relative risk levels.

The sustainability factors each have a different level of impact (positive or negative) on a format’s risk level, with several high-impact factors having the greatest effect on the calculations.

High Impact Factors:

  • Positive:
    • A high level of adoption; the availability of format documentation; the ability for a file to document itself; a lack of software dependencies; and no requirement for technical protections (such as encryption) provide the most positive impact.
  • Negative:
    • Format age and required hardware and/or software dependencies have the most negative impact.

The answers to all the questions have been assigned numeric values, which are used to calculate an overall Risk Rating and a general risk level which translates to: Low Risk, Moderate Risk, and High Risk.

Additionally, NARA prioritizes formats in our holdings for preservation actions. The Prioritization assessment is modeled on the traditional preservation model of Value/Use/Need. For our purposes, we use Need/Use/Feasibility to determine our preservation priorities. The Risk Rating represents the “Need” for a preservation action. “Use” is represented by evaluating Prevalence: how common the format is in our holdings at the time of assessment, therefore approximating the level of use of the format in the permanent records of the Federal Government. There is no way to map the “Value” of the holdings to individual file formats because record sets/series typically contain multiple file formats. Instead, we have replaced Value with “Feasibility,” or the capacity for NARA to process and convert formats. We assess Feasibility based on the general availability of tools for format migration that do not alter the content in unacceptable ways as well as our capacity to perform acceptable migrations.

For both the assessment of Risk and Prioritization, the lower the NARA Total number, the greater the risk or need. It is possible for numbers to be negative integers.

We are sharing our current completed matrix as a template for its use or adaptation by any interested organizations.

For a more technical discussion of the development and use of the Risk and Prioritization Matrix, a conference paper presented at the 2018 iPRES International Digital Preservation meeting is available at: https://osf.io/ctw3g/. Note that this discusses an earlier iteration than what is current.

File Format Preservation Action Plans

We are also sharing our File Format Preservation Action Plans. These Plans are not exhaustive nor universally applicable proposed actions and recommended or endorsed tools: these represent file formats and variant versions in NARA holdings, the current NARA risk assessment, processing capabilities, and tools in use at NARA. These Plans apply to files once they have been deemed permanent for NARA's holdings; the appraisal guidelines for when a record is permanent is different for Congressional, Federal, and Presidential records.

These plans consist of two types of documentation. The first is documentation for multiple categories of electronic records. The 16 categories are:

  • Digital Audio
  • Digital Design and Vector Graphics
  • Digital Still Image
  • Email
  • Geospatial
  • Moving Image: Digital Cinema
  • Moving Image: Digital Video
  • Navigational Charts
  • Presentation and Publishing
  • Software and Code
  • Structured Data: Calendars
  • Structured Data: Databases
  • Structured Data: Generic
  • Structured Data: Spreadsheets
  • Textual and Word Processing
  • Web Records

Each category has its own Plan that contains a list of “Significant Properties,” which identify the properties, or characteristics, of a record (its Appearance, Behavior, Context, and Structure) that should be retained, if possible, in any format migration. These characteristics are important to ensure the highest fidelity format record migrations.

The second resource is the File Format Preservation Action Plan Spreadsheet. The spreadsheet covers over 700 variant versions of file formats and identifies:

  • Categories of electronic records associated with the format
  • Specifications, standards, and documentation where possible; some have no specification or standard available
  • Proposed preservation migration actions to be taken by NARA, including no action when appropriate
  • Recommended tools for processing and preservation actions

Related Resources

There are several related resources available from NARA about file formats:

How Can You Participate?

We are sharing these documents to be transparent about our approach to digital preservation and to share our current practices with Federal agencies, records managers, archivists, digital preservation professionals, researchers, private industry, other stakeholders and allied professionals, and members of the public to help us identify ways we can improve them.

We always welcome feedback on the following topics:

  • What revisions can you suggest to the proposed processing and preservation actions for the formats?
    • Are the Significant Properties for each category comprehensive enough for digital preservation?
    • Are the proposed preservation actions for the formats technically appropriate?
  • Are there appropriate tools for processing and preservation migrations of specific formats that we do not have listed?
  • Are there other formats we haven’t identified that need plans?

Please use the issues feature on this site to leave a specific comment or question or to just start a discussion. You can read more about how to contribute here. NARA staff will respond as quickly as they can.

We update the matrix and plans on an ongoing basis in response to changing risks and new technologies and formats.

More Repositories

1

federalregister-api-core

Federal Register 2.0 API and Data Importer
Ruby
132
star
2

Catalog-API

National Archives Catalog API
125
star
3

File-Analyzer

NARA File Analyzer and Metadata Harvester
Java
104
star
4

federal_register

Ruby API Client for FederalRegister.gov that handles searching articles and getting information about agencies
Ruby
70
star
5

federalregister-web

Federal Register 2.0 Web Display
Ruby
33
star
6

Video-Frame-Analyzer

NARA Video Frame Analyzer
Java
28
star
7

partner-data-transform

Python scripts to transform partner data for upload to National Archives Catalog
Python
16
star
8

nara-scripts

Scripts used in the work of the US National Archives
Python
11
star
9

AVI-MetaEdit

NARA AVI-MetaEdit
C++
10
star
10

Electronic-Records-Accessioning-Support-Tools

This repository shares NARA-created open source software to support federal agencies in their preparation of metadata and permanent electronic records for transfer to NARA.
10
star
11

fr2_blog

PHP
9
star
12

MediaInfo

NARA MediaInfo
7
star
13

wartime-films-project

App and metadata for Wartime Films Project
C#
7
star
14

capstone-grs

Capstone GRS Website
JavaScript
7
star
15

opengovplan

NARA's Open Government Plan 2016-2018
CSS
7
star
16

Twitterbot

Experimental twitter bot using NARA's test API.
Python
6
star
17

catalog-source

Source code for the National Archives Catalog
Java
6
star
18

social-media-strategy

National Archives Social Media Strategy 2017-2020
JavaScript
5
star
19

nara-ruby

A Ruby client for the National Archives Catalog API
Ruby
5
star
20

nara-node

A node client for the National Archives Catalog API
JavaScript
4
star
21

wikimedia-upload

Upload digital content from the National Archives Catalog to Wikimedia Commons
Python
4
star
22

1950-Census-Textract-Code

Custom Amazon Textract code developed by NARA to extract data from the 1950 Census records.
Python
4
star
23

era-tools

Collection of tools for processing archival artifacts
Java
4
star
24

sandbox

Sandbox for demos of new and improved Archives.gov web pages.
HTML
3
star
25

NARA-web-components

Web Components Monorepo for NARA
3
star
26

feedback

Do you have feedback, ideas, or questions for the US National Archives? Use this repository's Issue Tracker to join the discussion.
3
star
27

nara-facebook-bot

A Facebook bot prototype to have users tag images from the National Archives Catalog
JavaScript
3
star
28

OPAProd

Tracking enhancements to OPAProd
Python
1
star
29

naraapi-when

Prototype for When Am I? game using the National Archives Catalog API
JavaScript
1
star
30

catalog-scalability-demo

Working repository for National Archives Catalog scalability demo
Java
1
star
31

naraapi-finding-aids

Prototype for a visual overview of NARA’s holdings using the National Archives Catalog API
JavaScript
1
star
32

catalog-tools

1
star
33

strategic-plan

National Archives Draft Strategic Plan 2022-2026
SCSS
1
star
34

nara_captcha

A captcha module that helps identify text vs. handwritten documents
HTML
1
star
35

nara_image_tool

A module to pull images from the National Archives Catalog to edit in Drupal 8
PHP
1
star
36

strategic-plan-2022-2026

NARA Strategic Plan 2022-2026
HTML
1
star
37

nara-ugfa

NARA User Generated Finding Aids Prototype
JavaScript
1
star
38

nixontapes-private

White House Tapes of the Nixon Administration - XML source files for conversation subject logs and participant metadata
JavaScript
1
star