Policies, Strategies and Guidelines

Harvesting

ShareGeo Open

ShareGeo Open is a data repository for Open data. There are lots of useful spatial datasets that have been deposited by users for others to download and re-use. This create – share – reuse philosophy is central to ShareGeo Open. All the data in the repository is open and can be re-used freely making it ideal for students, researchers and teaching staff to find data. In addition to the just downloading data, users can upload data to the repository.  URL: www.sharegeo.ac.uk

WebCite

WebCite® is a project initiated by the Centre for Global eHealth Innovation at the University of Toronto intended to digitally archive web material (web pages, PDF documents, and so on) which are cited in scholarly articles. The idea of WebCite is that authors of scholarly papers (as well as editors and publishers of scholarly work) are increasingly citing digital material which is in the public domain on the web, yet which is at risk to disappear, i.e.

PARADIGM Online Workbook

Between 2005 and 2007, the Paradigm project of the Bodleian Library and John Rylands University Library explored the issues involved in the long-term preservation of born-digital private papers in the context of hybrid archives - those that are composed of traditional and born-digital formats. The project accessioned sample archives from contemporary UK politicians and used these to gain practical experience of combining archival and digital curation worflows, standards, tools and technologies. An Online Workbook was created during the project and a print edition has recently been produced.

Web Curator Tool

The Web Curator Tool (WCT) is a tool for managing the selective web harvesting process. It is designed for use in libraries and other collecting organisations, and supports collection by non-technical users while still allowing complete control of the web harvesting process. The WCT Project is a collaborative effort by the National Library of New Zealand and the British Library, initiated by the International Internet Preservation Consortium.
Suitable for:

OJAX

OJAX provides a highly dynamic AJAX-based user interface to a federated search service for OAI-PMH compatible repository metadata. OJAX is simple, non-threatening but powerful.
Suitable for:

RODA (Repositório de Objectos Digitais Autênticos)

The National Archive Institute of Portugal (IAN/TT) doesn't currently have the needed infrastructures to support the processes of ingestion and management of digital objects produced by the public administration (PA). The initiatives of the eGovernment establish the need to support its activity in information and communication technologies to improve the efficiency, productivity and quality of their public services. In this scenario it is clear that the number of digital objects produced by these institutions will grow, and that their legal value and authenticity should be assured.

Grainger Engineering Library Information Center - Digital Library Research Projects

A collection of digital library projects under existence at the University of Illinois at Urbana-Champaign. These include search tools and resources for digital library development. NOTE: To the best of our knowledge, this project is no longer active.  Please let us know if you have more current information!

Appraisal and Selection

SURFSHARE Guidelines on Selection of Research Data

Report on appraising and selecting research data, prepared by Heiko Tjalsma of Netherlands-based DANS and Jeroen Rombouts of Delft Technical University, for the SURF Foundation. This study shows the latest situation in the area of selecting research data, based on a survey of the literature, interviews with important figures, and the experience gained by DANS and the 3TU Data Centre.

PARADIGM Online Workbook

Between 2005 and 2007, the Paradigm project of the Bodleian Library and John Rylands University Library explored the issues involved in the long-term preservation of born-digital private papers in the context of hybrid archives - those that are composed of traditional and born-digital formats. The project accessioned sample archives from contemporary UK politicians and used these to gain practical experience of combining archival and digital curation worflows, standards, tools and technologies. An Online Workbook was created during the project and a print edition has recently been produced.

RODA (Repositório de Objectos Digitais Autênticos)

The National Archive Institute of Portugal (IAN/TT) doesn't currently have the needed infrastructures to support the processes of ingestion and management of digital objects produced by the public administration (PA). The initiatives of the eGovernment establish the need to support its activity in information and communication technologies to improve the efficiency, productivity and quality of their public services. In this scenario it is clear that the number of digital objects produced by these institutions will grow, and that their legal value and authenticity should be assured.

Emulation

CAMiLEON (Creative Archiving at Michigan and Leeds: Emulating the Old on the New)

The CAMiLEON Project is developing and evaluating a range of technical strategies for the long term preservation of digital materials. User evaluation studies and a preservation cost analysis are providing answers as to when and where these strategies will be used. The project is a joint undertaking between the Universities of Michigan (USA) and Leeds (UK) and is funded by JISC and NSF. CAMiLEON stands for Creative Archiving at Michigan and Leeds: Emulating the Old on the New.
Suitable for:

Migration

iPres 2009: van Horik on MIXED framework for curation of file formats

Scholars in the Netherlands can deposit or search information in a repository system called DANS EASY, containing about 500,000 files, with a wide diversity of formats. How do I deal with a file called cars.DBF, now an obsolete format. There system can read such formats and convert them to the XML-based MIXED format, which identifies the data type and contains information on structure and content. So this was a smart conversion from the binary, obsolete dbase file to an XML reusable file. In the future it can be converted from this format to a current format of choice.

Read more >

Open Office as a document migration on demand tool- again

We’ve seen suggestions in comments on this blog, and on other blogs, that code is better than specifications as representation information, and that well-used, running open source code is better than proprietary code.

Read more >

Thoughts on conversion issues in an Institutional Repository

A few people from a commercial repository solution visited UKOLN last week to talk about their brand and the services they offer. This was a useful opportunity to explore the issues around using commercial repository solutions rather than developing a system in-house, which is where most of my experience with institutional repositories has lain to date.

Read more >

A Question of Authenticity

I’m just on my way back from Wigan, having given a presentation this afternoon on the role of the records manager in digital preservation to a group of, you’ve guessed it, records managers. I was really encouraged to see that the group had decided to dedicate their entire meeting today to tackling the thorny issue of digital preservation.

Read more >

Migration on Request: OpenOffice as a platform?

Following on from my previous post relating to legacy formats, I was thinking again about the problems of dealing with documents in those formats. For some, the answer lies in emulation and perpetual licences of those original software packages, but for me that just doesn't cut the mustard. I won't have access to those packages, but I might want access to the documents.

Read more >

Question on approaches to curating textual material

Dave Thompson, Digital Curator at the Wellcome Library asked a question on the Digital Preservation list (which is not well set up for discussion just now). I've replied, but we agreed I would adapt my reply for the blog for any further discussion that might emerge."I'm looking for arguments for and against when, and if, digital material should be normalised. I'm thinking about the long term management of textual material in proprietary formats such as MS Word. I see three basic approaches on which I'm seeking the lists comments and thoughts.

Read more >

Authenticity across migrations

I discovered a few days ago that I have 4 digital objects that are (I believe, but am not certain) in some strong senses “the same” (in their information content), but which are also completely different (in their bits). These objects are the result of a chain of “exports” and “imports”, and “save as…” operations, prompted partly by a change of technology (from a Windows PC running Mind Manager to a Macintosh running NovaMind), and partly from a need to make the content of the object more accessible to colleagues who do not use either software package.

Read more >

Policies

Biosharing

Functionality:  CATALOGUES The web-based BioSharing catalogues aim to centralize bioscience data policies, reporting standards and links to other related portals. 1. Providing a “one-stop shop” for those seeking data sharing policy documents and information about the standards and technologies that support them. 2. Exposing core information on well-constituted, community-driven standardization efforts and link to their standards, documentation, training material, news and contact point. 3.

CODATA Directory of Scientific Data Policy Statements

This CODATA resource lists statements that express the policies of a number of organizations on data issues. Most of these are related in some way to the environmental sciences, where international sharing of data on a global scale is essential to progress in research.

Legal Strategies for Streamlining Collaboration in an e-Research World

A collection of papers that arose from discussions held at a Roundtable entitled: 'Streamlining Collaboration in an e-Research World' which was convened by the Legal Framework for e-Research Project and held on 12-13 June 2008 at the Queensland University of Technology.

ICT Guides Website

The ICT Guides website is designed to help arts and humanities researchers find out more about the use of Information and Communications Technology in their work by showing them:

RODA (Repositório de Objectos Digitais Autênticos)

The National Archive Institute of Portugal (IAN/TT) doesn't currently have the needed infrastructures to support the processes of ingestion and management of digital objects produced by the public administration (PA). The initiatives of the eGovernment establish the need to support its activity in information and communication technologies to improve the efficiency, productivity and quality of their public services. In this scenario it is clear that the number of digital objects produced by these institutions will grow, and that their legal value and authenticity should be assured.

GovTalk (UK)

The purpose of this site is to enable the Public Sector, Industry and other interested participants to work together to develop and agree policies and standards for e-government.
Suitable for:

Safeguarding European Photographic Images for Access (SEPIA)

In 1999 the European Commission on Preservation and Access (ECPA) initiated a project aimed at the long-term preservation of all kind of photographic materials and defining the role of new technology in collection management. This resulted in Safeguarding European Photographic Images for Access (SEPIA). The project was set up explicitly to bring together representatives from different types of institutions that hold photographs: libraries, archives and museums, as well as from research institutes.

Repositories

Biosharing

Functionality:  CATALOGUES The web-based BioSharing catalogues aim to centralize bioscience data policies, reporting standards and links to other related portals. 1. Providing a “one-stop shop” for those seeking data sharing policy documents and information about the standards and technologies that support them. 2. Exposing core information on well-constituted, community-driven standardization efforts and link to their standards, documentation, training material, news and contact point. 3.

Workshops prior to the International Digital Curation Conference

Pre-conference workshops can be very useful and interesting; they can be a good part of the justification for attending a conference, giving an extended opportunity to focus on a single topic, followed by a broader (but shallower) look at many topics, at the conference itself. This time it is quite frustrating, as I would very much like to go to all the workshops! There is still time to register for your choice, and for the IDCC conference itself.

Read more >

Libraries of the Future: SourceForge as Repository?

In his talk (which he pre-announced on his resumed blog), Peter Murray-Rust (PMR) suggested (as he has done previously) that we might like to think of SourceForge as an alternative model for a scholarly repository (“Sourceforge. A true repository where I store all my code, versioned, preserved, sharable”).

Read more >

More posts on International repositories workshop

Just to note a post by Jeremy Frumkin from Arizona, who like me was in the Organisation breakout, and another by Maurice Vanderfeesten from the SURFfoundation, who was in the Identifiers workshop, and gives a very clear summary of what resulted there.

Read more >

International Repositories Infrastructure workshop

Amsterdam in Spring, who could turn down the offer? Perhaps it would be irresistible a little later in Spring than early March (brrr), but when the sun did come out, and the workshop was done, it was lovely. I was in Amsterdam for a curious International workshop on repository infrastructure, funded by JISC and SURF, with the DRIVER project. It turns out I had no idea what repository infrastructure meant before I went, and I guess I know only a little more now.

Read more >

Repository preservation revisited

Are institutional repositories set up and resourced to preserve their contents over the long term? Potentially contradictory evidence has emerged from my various questions related to this topic.

Read more >

Repositories and preservation

I have a question about how repository managers view their role in relation to long term preservation.

Read more >

Gibbons, next generation academics, and ir-plus

Merrilee on the hangingtogether blog, with a catchup post about the fall CNI meeting, drew our attention to a presentation by Susan Gibbons of Rochester, on studying next generation academics (ie graduate students), as preparation for enhancements to their repository system, ir-plus.

Read more >

Project data life course

This blog post is an attempt to explore the “life course” of an arbitrary small to medium research project with respect to data resources involved in the project. (I want to avoid the term life cycle, since we use this in relation to the actual data.)

Read more >

DCC White Rose day

Fresh from a fascinating day in Sheffield, organised by the DCC and the White Rose e-Science folk. Objectives for the day included building closer relationships between White Rose and DCC, helping us learn more about their approach to e-Science, identify data issues, and influence the DCC agenda!

Read more >

Some interesting posts elsewhere

I’m sorry for the gap in posting; I’ve been taking a couple of weeks of leave at the end of my trip to Australia. Since return I’ve been catching up on my blog reading, and there are some interesting posts around.

Read more >

ARROW Repositories day: 3

Dr Alex Cook from the Australian Research Council (a money man! Important!) talking on the Excellence in Research Framework (ERA), the Access Framework and ASHER. ERA appears to be like the UK’s erstwhile RAE, and will use existing HE Research Data Collection rules for publication and research income information where possible. 8 clusters of disciplines have been identified. Currently looking at the bibliometric and other indicators which will be discipline-specific (principles, methodologies and a matrix showing which are used where).

Read more >

ARROW Repositories day: 1

I’ve been giving a talk about the Research Repository System ideas at the ARROW repository day in Brisbane, Australia (which is partly why there has been a gap in posting recently). Here are some notes on the other talks.

Read more >

Data as major component of national research collaboration

This is perhaps the last of my posts resulting from conversations and presentations at the UK e-Science All Hands meeting in Edinburgh. This one relates to Andrew Treloar’s presentation on the Australian National Data Service (ANDS), and its over-arching programme, Platforms for Collaboration, part of the National Collaborative Research Infrastructure Strategy.

Read more >

ARCHER: a component of Australian e-Research infrastructure?

At the e-Science All Hands meeting, David Groenewegen from Monash spoke (PPT, also from Nick Nicholas and Anthony Beitz) about the outputs of the ARCHER project, almost finished, intended to provide tools for e-Research infrastructure. They see these e-Research challenges:

Read more >

How to make repositories a killer app for scientists

Cameron Neylon wrote a nice post indirectly addressing the question of how Nature might make Connotea more useful. It's well worth reading for its own merits, but I was so taken by his questions, I thought they might be re-purposed to apply to repositories. As Cameron says "These are framed around the idea of reference management but the principles I think are sufficiently general to apply to most web services".

Read more >

Adding value to data: eScience conference session

If you nearly had a paper ready for the International Digital Curation Conference in December (the closing date was the end of July), you may still have time to get one in to the special session "Adding value to data – Digital Repositories in the e-Science world" at the 4th IEEE International Conference on eScience, whose deadline has just been extended to 31 August. From the call for papers on the DReSNet blog:

Read more >

Repositories and the CRIS

As I mentioned in the previous post, there has been some discussion in the JISC Repositories task force about the relationship between repositories and Current Research Information Systems (CRIS). Stuart Lewis asserted, for example, that “Examples of well-populated repositories such as TCD (Dublin) and Imperial College are backed by CRISs.” So it seems worth while to look at the CRIS with repositories in mind.

Read more >

Comments on Negative Click Research Repository System

You may remember that I wrote a series of posts about a Research Repository System, aiming to improve deposits by getting repositories to do more that’s useful to the researcher. I had suggested it should contain these elements:

Read more >

Morning at the Repository Fringe

I only managed the first couple of hours of the Edinburgh Repository Fringe event [two days ago... but Google went bonkers and decided this was a spam blog meanwhile!] but it was already great fun, and I can see that those who stay the course will have a great time, and learn heaps. Note to self: more conferences should be like this!

Read more >

Load testing repository platforms

Very interesting post on Stewart Lewis's blog:

Read more >

Negative Click, Positive Value Research Repository Systems

I promised to be more specific about what I would like to see in repositories that presented more value for less work overall, by offering facilities that allow it to become part of the researcher’s workflow. I’m going to refer to this as “the Research Repository System (RRS)” for convenience.

Read more >

Responses to RAW versus TIFF: compression, error and cost-related

This is the second post summarising responses to the “RAW versus TIFF” post made originally by Dave Thompson of the Wellcome Library. The key elements of Dave’s post were whether we should be archiving using RAW or TIFF (image-related responses to this question are summarized in a separate post). A subsidiary question on whether we should archive both is greatly affected by cost, which is dependent on issues of compression, errors etc. Responses on these topics are covered in this post.

Read more >

Repository Fringe

Robin Rice passed on this announcement about Edinburgh's newest festival, the Repository Fringe. She writes: "the event is being jointly planned by Edinburgh and Southampton to coincide with the 'preview week' before the opening of the Edinburgh Festival Fringe. In the spirit of the Fringe, it's a kind of an 'unconference' and we're encouraging people to sign up to participate in various ways on the wiki, e.g.

Read more >

Do we really want repositories to be more Web2.0-like?

I’m spending some time thinking through what a negative click, positive value repository system should be like. Thanks to everyone for their comments on this idea. Various people suggested we should be more Web 2.0-like. Good idea, that’s following success. Or is it?

Read more >

Reaction to Negative Click Repositories

I was very pleased with reactions to the Negative Click Repositories post. I’ll come back to the idea itself in a later post, but this just attempts to gather some of the comments together (the first few are comments to the original post, but I know my blogreader doesn’t expose those easily).

Read more >

Negative click repositories

I wanted to write a bit more about this idea of a negative click repository (negative cost was a bad name, as there is a real positive $ and £ cost to the repository, rather than the depositor). First some ancient history...

Read more >

Adding Value through SNEEPing

I'm really quite taken by the SNEEP plug-in for ePrints that was showcased at OR08. It enables users to add their comments/annotations to an item stored in an eprints repository. These annotations are then made public and items can have numerous annotations added by different users. I don't know if there's a limit...? This is a great example of one way that value can be added to data collections in the context of digital curation - though admittedly the value of the added comments and annotations will be debatable!

Read more >

PLATTER

You may already have had a chance to look through the recent offering from DPE - the PLATTER tool. If you haven't already done so, then it's well worth a look. At fifty-odd pages long you'll need a bit more than five minutes, but it's a really interesting proposal for approaching planning for a new repository.

Read more >

UK Repositories claiming to hold data

The OpenDOAR and ROAR services both present self-reported claims by repositories across the world about their contents, backed up by some harvested facts. I’m interested in those UK repositories that claim to hold data.

Read more >

Repositories for scientists

Nico Adams in a post to Staudinger's Semantic Molecules has added to the scenarios for repositories for scientists:"Now today it struck me: a repository should be a place where information is (a) collected, preserved and disseminated (b) sema

Read more >

Data, repositories and Google

In a post last year, Peter Murray Rust criticised DSpace as a place to keep data:"The search engines locate content. Try searching for NSC383501 (the entry for a molecule from the NCI) and you’ll find: DSpace at Cambridge: NSC383501"But the actual data itself (some of which is textual metadata) is not accessible to search engines so isn’t indexed. So if you know how to look for it through the ID, fine. If you don’t you won’t. [...]

Read more >

Repositories for the people?

I have been doing some thinking over the past couple of months about the role of repositories in digital curation, and it appears that others have as well. Dorothea Salo (Digital Repository Services Librarian at George Mason University) wrote a series of fascinating posts on her Caveat Lector blog, illustrating through fictional personae the dilemmas faced by many of the players in science, in academic and non-academic management in the fictional University of Achaea. The players are:

Read more >

Archiving Service

The other day I had an interesting discussion with a group of IT facilities management guys here at Edinburgh, who have been asked to write the requirements for a new version of their Archive Service, capable of handling modern requirements for huge amounts of data. It was an interesting discussion, and I hope to remain involved in what they are planning (which is a long way from getting funded, I guess).

Read more >

Question on approaches to curating textual material

Dave Thompson, Digital Curator at the Wellcome Library asked a question on the Digital Preservation list (which is not well set up for discussion just now). I've replied, but we agreed I would adapt my reply for the blog for any further discussion that might emerge."I'm looking for arguments for and against when, and if, digital material should be normalised. I'm thinking about the long term management of textual material in proprietary formats such as MS Word. I see three basic approaches on which I'm seeking the lists comments and thoughts.

Read more >

Subject "versus" institutional repositories

There's a concept in maths called "closed but unbounded". I'm not sure it's exactly to the point (I hope that's a pun), but "subjects" seem a bit like that. You can be pretty sure about most of the stuff that's not in a subject (or "domain"), and most of the stuff that is in it, but you can be very puzzled about some of the edges, and can find yourself in some extremely surprising discussions at times about parts of subjects that challenge most of the ideas you had. So subjects turn out to be very un-bounded.

Read more >

European e-Science Digital Repository Consultation

Philip Lord wrote to tell me that he and and Alison Macdonald are conducting a study for the European Commission “Towards a European e-Infrastructure for e-Science Digital Repositories”, (e-SciDR) – see www.e-SciDR.eu. This is a short study to summarize the situation regarding repositories in Europe and to propose policies for the Commission for repository development in Europe. As part of the study process the Commission is hosting a public consultation through a questionnaire... The letter inviting participation follows:

Read more >

JISC Repositories conference day 2

This is a rather late posting about day 2 of the JISC Repositories Conference, a week or so ago now. I mainly attended the data-oriented stream. I was very interested in the presentations from the StORe and CLADDIER projects, both of which touched on data citation. They go about this in different ways.

Read more >

JISC Repositories conference day 1

Yesterday and today I am at the JISC Repositories Conference in Manchester. This turns out to be (at least in my small section, and the two plenaries so far) a much more interesting event than I expected. There has been a useful focus on the fringes of the repository movement, such as the role of data, and as well an interesting re-exploration of what repositories are, and what they are for.

Read more >

The DCC is funded by

Joint Information Systems Committee