• Stars
    star
    120
  • Rank 294,265 (Top 6 %)
  • Language
    Java
  • License
    Other
  • Created over 9 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The Bagger application packages data files according to the BagIt specification.

Bagger

License

The Library of Congress is reviewing next steps for the future of Bagger. Additional information is forthcoming.

Introduction

The Bagger application was created for the U.S. Library of Congress as a tool to produce a package of data files according to the BagIt specification (http://tools.ietf.org/html/draft-kunze-bagit).

The Bagger application is a graphical user interface to the BagIt specification. The latest Bagger release is available on GitHub at https://github.com/LibraryOfCongress/bagger/releases/latest

Bagger differs from the java Bagit Library by providing graphical user interface for file and data manipulation features as well as a visual view of the bag contents, bag state and options. In addition Bagger provides a project profile capability. Users can create customized bag-info.txt data with project specific properties that the user defines.

These project profiles can be edited manually and shared with other users.

Support

  1. The Digital Curation Google Group (https://groups.google.com/d/forum/digital-curation) is an open discussion list that reaches many of the contributors to and users of this open-source project
  2. If you have found a bug please create a new issue on the issues page
  3. If you would like to contribute, please submit a pull request

Installing

  1. Install Java from https://java.com
  2. Download the latest release from https://github.com/LibraryOfCongress/bagger/releases/latest
  3. Unzip to a location. This will be known as <BAGGER_INSTALL_DIRECTORY> for the rest of the instructions

Running Bagger on Windows

  1. navigate to <BAGGER_INSTALL_DIRECTORY>\bin
  2. double-click on the bagger.bat file

Running Bagger in Mac OS X/Linux/Ubuntu

  1. Navigate to <BAGGER_INSTALL_DIRECTORY>/bin
  2. double-click the file named bagger

License

License and other related information are listed in the LICENSE.txt file included with Bagger.

Project Profile

Bag metadata is stored in a 'bag-info.txt' file, as defined in the BagIt specification. When using Bagger to manage bags for a project or collection, it can be helpful to have a template of bag-info.txt fields and values that are filled out similarly for each bag in that project or collection. Profiles let users define a collection of bag metadata fields and default field values to be used with each bag in a consistent way. Users can select a project profile when creating a bag, and that profile will determine the initial fields and values in the bag-info.txt file, and the profile used is identified by the "Profile Name" field.

Creating custom project profiles

User can create custom project profiles using a simple JSON-based format. When the bagger application is first started the bagger folder gets created in the user's home folder and contains some default profiles. Profile files should be named <profile name>-profile.json and stored in the bagger's home directory: /bagger.

On Windows, it is C:\Documents and Settings\<user>\bagger. On Unix-like operating system, it is ~/bagger. Also when the bagger application is started it creates a few default profiles in the above bagger folder, which can be used as a guide to create custom profiles. Since pull request #12 you can now change where bagger looks for profiles by setting the system property BAGGER_PROFILES_HOME. This can be set using environment variable BAGGER_OPTS like this in bash:

export BAGGER_OPTS="-DBAGGER_PROFILES_HOME=/tmp"

Also when using a new Bagger version please remove the bagger folder created by the previous Bagger version in the user's home folder.
This will insure that the new/updated profiles are created in the bagger folder after the new bagger version is started.

Profile format

To support the use of profiles for bag-info.txt editing in the Bagger and in the various Transfer webapps, the following describes a JSON serialization of a profile:

{
   "<field name>" : {
                         "fieldRequired" : <true/false, where false is default if not present>,                
                         "requiredValue" : "<some value>",
                         "defaultValue"  : "<some value>",
                         "valueList"     : ["<some value>",<list of other values...>]
                     },
   <repeat for other fields...>
}

The meanings of some field properties are explained here:

  • "fieldRequired": true/false, where false is default if not present
  • "requiredValue": some value if fieldRequired is true
  • "defaultValue": default value
  • "valueList": field value or a list of field values that are stored in a drop down list, which is displayed in the Bag-Info tab form in Bagger

The Project Profile format is subject to change in the future releases.

Ordering of fields

Since version 2.5 you can now enforce the ordering in the display of the fields. You MUST use the keyword ordered. An example:

{
	"ordered": [{
		"Send-To-Name": {
			"requiredValue": "John Doe"
		}
	}, {
		"Send-To-Phone": {
			"requiredValue": "+0.000.000.0000"
		}
	}, {
		"Send-To-Email": {
			"requiredValue": "[email protected]"
		}
	}]
}

For a full example see ordered-other-project-profile.json

Sample profile

Here is a sample profile (please ignore the comments (//) when creating a JSON profile, it is only for explaining the fields):

{
   //Source-organization is required and may have any value
   "Source-organization" : {
                             "fieldRequired" : true
                           },

   //Organization-address is not required and may have any value
   "Organization-address" : {},

   //Contact-name is not required and default is Justin Littman
   "Contact-name" : {
                      "defaultValue" : "Justin Littman"
                    },

   //Content-type is not required, but if a value is provided it must be selected from list
   "Content-type" : {
                      "valueList" :["audio","textual","web capture"]
                    },

   //Content-process is required, has a default value of born digital, and must be selected from list of field values in the Bag-Info tab form in Bagger
   "Content-process" : {
                         "fieldRequired" : true,
                         "defaultValue" : "born digital",
                         "valueList" : ["born digital","conversion","web capture"]
                        }
}

The file should be named <profile name>-profile.json. For example, wdl-profile.json. The items in the profile file (i.e. JSON file) are listed in the Bag-Info tab of Bagger.

Bagger Build Process

Bagger uses uses Gradle for its build system. Check out their great documentation to learn more.

To build the Bagger application, execute the following steps from the top level folder of the distribution:

gradle distZip

After running successfully the bagger application will be zipped and located in bagger/build/distributions/bagger.zip. Simply unzip to install anywhere.

Exceptions

There are a few common causes for the bagger application to fail which are:

i) Incorrect version of the Java Run Time Environment or if no System Path is set for Java. The fix is to use the correct Java Runtime Environment (i.e. 1.7.xx in Windows and OpenJDK 7 in Linux/Ubuntu)

ii) The bagger folder in the user's home folder contains profile files using older JSON format. The fix is to delete the old profiles in the bagger folder and rerun the bagger application.

More Repositories

1

api.congress.gov

congress.gov API
Java
624
star
2

newspaper-navigator

Jupyter Notebook
225
star
3

bagit-python

Work with BagIt packages from Python.
Python
210
star
4

data-exploration

Tutorials for working with Library of Congress collections data
Jupyter Notebook
179
star
5

concordia

Crowdsourcing platform for full text transcription and tagging. https://crowd.loc.gov
Python
154
star
6

bagit-java

Java library to support the BagIt specification.
Java
71
star
7

citizen-dj

JavaScript
70
star
8

chronam

This software project is no longer being actively developed at the Library of Congress. Consider using the Open-ONI (https://github.com/open-oni) fork of the chronam software. Project mailing list: http://listserv.loc.gov/archives/chronam-users.html.
Python
70
star
9

viewshare

A web application developed by Zepheira for the Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) which allows users to create and share embeddable interfaces to digital cultural heritage collections. A project of the Library of Congress; the project was retired in March 2018. Note: project members may work on both official Library of Congress projects and non-LC projects.
JavaScript
45
star
10

bagger-js

Upload BagIt-format deliveries to S3 entirely in the browser
JavaScript
32
star
11

coding-standards

Library of Congress coding standards
Python
27
star
12

labs-ai-framework

Planning Framework used by LC Labs for planning AI experiments towards responsible implementation
CSS
24
star
13

gazetteer

A historical gazetteer project of the Library of Congress. Note: project members may work on both official Library of Congress projects and non-LC projects.
Python
23
star
14

wdl-viewer

A fast, responsive HTML5 viewer for scanned items, developed for the World Digital Library. A project of the Library of Congress. Note: project members may work on both official Library of Congress projects and non-LC projects.
JavaScript
22
star
15

speech-to-text-viewer

AWS Transcribe evaluation pipeline: bulk-process audio files and view the results
Python
17
star
16

django-tabular-export

Utilities used to export data into spreadsheets from Django applications. Currently used internally at the Library of Congress in the WDL cataloging application.
Python
15
star
17

Exploring-ML-with-Project-Aida

Jupyter Notebook
13
star
18

bagit-conformance-suite

Test cases for validating BagIt implementations
Python
10
star
19

premis-v3-0

PREMIS schemas are written in XML. They are open source community tools that allow PREMIS users to validate PREMIS records against a version of the PREMIS schema.
10
star
20

mods2bibframe

mods2bibframe XSLT
XSLT
8
star
21

MarcMods3.6xsl

MARC>MODS--the mappings and corresponding XSLTs are open source community tools developed by NDMSO at LC.
XSLT
7
star
22

hitl

Code and documentation for Humans in the Loop (HITL), an LC Labs sponsored collaboration with metadata solutions provider AVP. The experiment explores a framework and considerations for integrating crowdsourcing and machine learning in ways that are ethical, engaging, and useful.
JavaScript
7
star
23

embARC

embARC (“metadata embedded for archival content”) manages internal file metadata including embedding and validation. Created by FADGI (Federal Agencies Digital Guidelines Initiative), in conjunction with AVP and PortalMedia, embARC enables users to audit and correct embedded metadata of a subset of MXF files, as well as both individual DPX files or an entire DPX sequence, while not impacting the image data.
HTML
7
star
24

speculative-annotation

Speculative Annotation is a web browser application written in Javascript and built with React, FabricJS, IIIF, OpenSeaDragon, and ChakraUI. Source images are hosted locally. The application uses the OpenSeadragon Viewer to render images, so your source images can be a combination of locally hosted images (within the application), or externally hosted images (for example, served from a IIIF image server).Application metadata is represented by a combination of local IIIF Presentation API 3.0 manifest files, and Library of Congress hosted IIIF manifest files. The application allows users to annotate select free to use items from the Library of Congress, save to browser or download locally.
JavaScript
7
star
25

pimtoolbox

The Library of Congress and the Florida Center for Library Automation developed the PREMIS in METS (PiM) Toolbox. The project provides PREMIS:METS conversion and validation tools that support the implementation of PREMIS in the METS container format.
Ruby
6
star
26

inside-baseball

Explore baseball collections from the Library of Congress and the National Museum of African American History and Culture
Python
6
star
27

iptables-gem

A project of the Library of Congress. Note: project members may work on both official Library of Congress projects and non-LC projects.
Ruby
5
star
28

sanborn-navigator

Jupyter Notebook
5
star
29

ADCTest

ADCTest is a desktop application, written in C++, that provides provides simple pass-fail reporting for the tests detailed in the FADGI Low Cost ADC Performance Testing Guidelines as well as more detailed results
C++
5
star
30

MarcMods3.5xsl

MARC>MODS 3.5--the mapping and corresponding XSLT are open source community tools developed by NDMSO at LC.
XSLT
4
star
31

pairtree

A project of the Library of Congress. Note: project members may work on both official Library of Congress projects and non-LC projects.
CSS
4
star
32

simple-artifact-uploader

A plugin for the Gradle build management tool that allows us to automatically upload completed binaries to the Artifactory deployment server.
Java
3
star
33

a-search-for-the-heart

HTML
3
star
34

seeing-lost-enclaves

Seeing Lost Enclaves is an initiative by Jeffrey Yoo Warren as part of the 2023 Innovator in Residence Program at the Library of Congress.
HTML
2
star
35

DVV

The Digital Viewer and Validator (DVV) tool is developed at the Library of Congress for use by National Digital Newspaper Program (NDNP) participants.
1
star
36

LC_Labs

1
star
37

viewshare_site

Site specific project retired Library of Congress instance of the Viewshare project
Python
1
star
38

marc2mads20

MARC>MADS--the mappings and corresponding XSLTs are open source community tools developed by NDMSO at LC.
1
star
39

CCHC

Computing Cultural Heritage in the Cloud (CCHC) is our Andrew W. Mellon-funded experiment for piloting cloud solutions to enable research including data analysis and reduction on large-scale digital collections. Three non-LC staff contracted researchers will analyze large collection datasets that are stored in and accessible from AWS, likely as JSON. The contracted research experts' code will demonstrate how the datasets are gathered, transformed, and manipulated to demonstrate the needs of computational analysis. Languages used in this code may include Python and JavaScript. Code will undergo security review as it is submitted as deliverables during the contract window, with final versions to be made available in GitHub repository by the end of Q2 FY 2022.
1
star
40

btp-data

This Python tutorial demonstrates how to process and visualize the Library of Congress' By the People transcription data using natural language processing.
Jupyter Notebook
1
star