The BagIt Library

The BagIt specification is a hierarchical file packaging format for the creation of standardised digital containers called 'bags,' which are used for storing and transferring digital content. Derived from work by the Library of Congress and the California Digital Library, a bag consists of a ‘payload’ of digital content, and ‘tags' (metadata files) to document the storage and transfer of the bag.

Bagit libraries have been developed for the Java, Python, Ruby and PHP programming languages, along with a Drupal module and a desktop application. They all support the creation, manipulation, and validation of bags. The Java library developed by the Library of Congress is the reference implementation and thus is known as the BagIt Library; it may be run as a standalone command-line tool or incorporated into applications. For example, the Bagger application (also developed by the Library of Congress) provides a graphical user interface to the BagIt Library.

A list of BagIt implementations is maintained by the California Digital Library.

Provider

The United States Library of Congress, and the National Digital Information Infrastructure and Preservation Program (NDIIPP).

Licensing and cost

Both the BagIt Library and Bagger are public domain in the United States. In other jurisdictions, so far as copyright applies, they are released under the MIT (Expat) Licence. They are free to download and use.

Development activity

Version 4.9.0 of the BagIt Library was released in February 2014. Bagger version 2.1.3 was released in January 2013.

The source code repository indicates that development of the BagIt Library is ongoing.

Platform and interoperability

Both the BagIt Library and Bagger require Java 6.

Functional notes

Bags contain at minimum three elements: a ‘payload’ and at least two ‘tags.’ The payload consists of the content being preserved. The first tag is a manifest itemising the files making up the content along with their checksums; the second is a bagit.txt file identifying the container as a bag and giving the version of the specification used and the character encoding of the tags.  The specification additionally allows for several optional tags.  

Documentation and user support

Documentation is extremely sparse, primarily consisting of README files detailing release notes. Copies of the BagIt specification are available from the websites of the Internet Engineering Task Force, the Library of Congress, and the California Digital Library.

It appears that the main user support consists of a mailing list hosted by Sourceforge; however, the list archive only shows about ten messages per year.

Usability

The BagIt Library uses a command-line interface, while Bagger provides a graphical user interface. No installation is required; the tools can simply be downloaded and run, although it may not be immediately clear to users how to do so.

Expertise required

BagIt is designed to create a common language for users exchanging digital materials, essentially negating the need for expertise about others’ protocols. However, for configuration, familiarity with one’s own repository’s technical protocols is essential.

Standards compliance

The BagIt specification is an Internet Engineering Task Force (IETF) internet draft.

Influence and take-up

The BagIt specification has become widely accepted in the preservation community, and is used by the Library of Congress, Chronopolis, and The Stanford Digital Repository, among others. The BagIt Library has been downloaded over 7400 times from Sourceforge, though as of July 2013 and version 4.5 of the software, the official download site is GitHub.

Last reviewed: 
24 November, 2014