Home > Resource Centre > Technology Watch Papers > DSpace
By Maureen Pennock, University of Bath
Digital Repositories play a vital role in the curation of digital materials and offer a convenient way to store, manage, reuse and curate a variety of digital materials. The term 'digital repository' can be applied to a number of different digital storage initiatives, which may also be referred to as 'institutional repositories' or 'digital archives'. A growing number of repository models and systems are available and used by a variety of communities. They can take many forms and carry out many different functions. This technology watch paper provides an introduction to the features and functionality of the DSpace digital repository system.
Digital Repositories are also commonly referred to as 'institutional repositories' or 'digital archives'.
The DSpace digital repository system was designed to capture, store, index, preserve, and provide access to institutional digital research materials. It can accept all forms of digital materials, ranging from text, images and datasets, to websites, multimedia, video and audio files. DSpace can be used in a variety of ways, including as an institutional repository, e-learning objects or e-theses repository, an electronic records management system, a digital asset management system, and a digital preservation system.
DSpace is freely available as Open Source Software. Originally developed by MIT (Massachusetts Institute of Technology) and Hewlett-Packard, further development is ongoing by the DSpace registered community of users (also known as the DSpace Federation). New versions, patches, and bug fixes are regularly issued via the SourceForge website. DSpace is written in Java and will run on any Linux or UNIX system and Windows XP. Implementers must select an appropriate storage system and install a small number of specific types of software (such as Jakarta Tomcat and Apache Ant) to support the DSpace system. Installation instructions and operational support are available from the DSpace website and the several mailing lists associated with the software.
DSpace is available under the BSD open source license, which permits proprietary commercial use of the software and incorporation of the code into proprietary products.
DSpace is a web-accessible system. Virtually any modern web browser can be used to submit and access content. As development of DSpace is ongoing, only initial and base functionality is discussed here. This paper refers to version 1.3.2.
Items can only be submitted by registered users. Submissions can come directly from creators or from third parties (provided the necessary permissions have been obtained). Users must upload items and associated metadata via the web-interface. The baseline metadata requested for each item is based upon the Dublin Core Metadata Schema, adapted by MIT Libraries to meet DSpace requirements, although it must be mapped to Dublin Core by the repository administration. Domain specific metadata may also be entered, as required by a particular implementation. DSpace calculates and retains a checksum of each item uploaded so that the integrity of the item and metadata can be verified at a later date, and the validity of the file checksums is periodically checked. The system automatically identifies the file format of deposits wherever possible. A license must be accepted, which enables the institution to manage, preserve, and provide further access to deposited materials. Items must then undergo an approval process before finally being accepted into the repository.
DSpace pays particular attention to the content-input side of the process. Each DSpace implementation can tailor the workflow process to accommodate the needs of its varying user-types.
A Relational Database Management System (RDBMS), either PostgreSQL (preferred) or Oracle (supported) must be installed in conjunction with DSpace to store content items and related metadata. Collection items can comprise multiple files (e.g. research papers with supporting datasets) and the METS — Metadata Encoding Transmission Standard — is used to maintain links between item components. By default, DSpace uses the CNRI (Corporation for National Research Initiatives) Handle System to provide unique and persistent identifiers for every item stored. Internal item retrieval mechanisms can be changed without affecting reference citations and other links to the content.
One of the earliest repository systems to tackle the issue of preservation, DSpace captures details of the specific file formats users submit and maintains a bitstream format for each bitstream in the system. System administrators can maintain a registry of known bitstream formats and the preservation service level available for each format types; however, if the format of the bitstream is unknown, the system will not be able to reliably support preservation and future access or re-use of the file contents. Most implementations maintain lists of 'supported' and 'unsupported' file formats — for example, the MIT implementation has a policy identifying 'supported' formats such as TIFF, PDF and HTML, 'known/unsupported formats' such as Microsoft Word and Lotus 1-2-3, and 'unknown/unsupported' formats that may be highly complex and rare.
An implementation can be searched for specific items or users can browse through contents. Item overviews are provided for authorised users that display the core metadata, identify the collection context, and provide links to the files that comprise the items. Users download files and render them with available and appropriate software. The rendering software is not generally stored in the system: in many instances the web browser may be capable of rendering the files; alternatively a specific software application may be required.
Many other institutions or collaborative groups around the world are also using or further developing DSpace; see the DSpace website [external] for a more complete list of implementers.