Because good research needs good data

Archiving images

Chris Rusbridge | 28 May 2008

One of our Associates recently posted a query on the DCC forum about image archiving formats:

'For long term preservation of digital images should we be archiving the proprietary .RAW files that come out of the digital SLRs we use or should we be converting these to the more open uncompressed .TIFF format and archiving those? Or indeed should we be archiving both?Whilst they are proprietary uncompressed .RAW files contain much useful information that both helps us manage the image and may help us preserve the image in the future. They are no larger than typicall TIFF files, well, they needn't be. However, future version migration seems inevitableTIFF files offer flecibility of working, and many tools are able to manipulate and format shift TIFF to produce a range of dissemination manifestations. However, if we work on the do it onece, do it right principle then long term storage becomes an issue of both capacity and cost.Storing both TIFF and RAW only exacerbates the storage issue, plus maybe creates long term management issues.What would you do? Store both, store the more open and flexible format or boldly go with the richness of RAW?Your thoughts and comments would be appreciated.'

As image preservation is relevant to those curating scientific data, I thought I'd reproduce his post (with permission of course) here to try and stimulate a bit more debate.First off - does it really have to be an 'either or' option? Why not store both? The cost of storage seems to just keep on coming down, and most architectures are more than capable of catering for multiple representations of an object. 'Do it once, do it right' is great if you can. But the 'lots of copies keeps stuff safe' principle has obvious benefits too. Storing multiple representations gives you more options in the future and whilst it's true that you managing additional files may require additional effort, I'm not sure that the potentially minimal costs of this would negate the potential benefits of multiple storage. (Reminder to self - read up on costs!) .It's probably an opportune moment to go back to your user and preservation requirements too, to determine what you really want from your image archive and to what extent each format meets these requirements.In terms of potential migrations, it's not inconceivable that we'll eventually come up with a better format than TIFF, so migration is potentially an issue regardless of which format you go for. More immediate - for me anyway- would be the length of time between migrations, and the ease of migration from one to another. These issues would need assessing too if going for one of the other (and it wouldn't be a bad thing to be aware of them even if you choose to store both versions).Finally, Adobe recently submitted their 'DNG Universal RAW format' to ISO, so the issue of this one (because it seems there are several RAW formats) being a proprietary format may not be a lengthy concern. I'm not that familiar with DNG RAW so I don't know how much extra information it may contain when compared to TIFF. Another thing to add to my 'to-do' list... .I'm sure our Associate would be grateful for any more input so do feel free to leave comments and I'll make sure he gets them.