You are here
Migration on Request: OpenOffice as a platform?
Following on from my previous post relating to legacy formats, I was thinking again about the problems of dealing with documents in those formats. For some, the answer lies in emulation and perpetual licences of those original software packages, but for me that just doesn't cut the mustard. I won't have access to those packages, but I might want access to the documents. Some of them for example, might be the PowerPoint 4 presentations created on a predecessor to the Macintosh that I use now, but which are un-readable with my current PowerPoint software (I CAN get at them by copying them to a colleague's Windows machine; her version of PowerPoint has input filters unavailable on my Mac).So I want some form of migration. In the example above, this is known as "Save as"!However, I know that every time I do migration I introduce some sort of errors. So if I migrate from those PowerPoint 4 files to today's PowerPoint, and then from today's to tomorrow's PowerPoint, and then from tomorrow's to the next great thing, I will introduce cumulative errors whose impact I will only be able to assess at some horribly cringe-making moment, like in the middle of a presentation using a host's machine. So the best way to do migration is to start from the original file and migrate to today's version. Always. It's nuts for Microsoft to drop old file format support from its software (at least from this pint of view).This approach of migrating from original version to today's version is called Migration on Request, and was described in a paper by Mellor, Wheatley and Sergeant back in 2002 (I referred to it earlier), but the idea hasn't caught on much. They had some other great ideas, like writing the migration tool in a specially portable version of C with all the nasty bits removed, called C--.I have wondered from time to time however, for that class of documents we call Office Documents (word processing, spreadsheets, presentations), whether tacking onto an open source project which has a strong developer community might be a better approach. Something like OpenOffice. I'm not sure how many file formats this already supports (always growing, I guess, but Chapter 3 of the "Getting Started" documentation lists the following:
Microsoft Word 6.0/95/97/2000/XP) (.doc and .dot)Microsoft Word 2003 XML (.xml)Microsoft WinWord 5 (.doc)StarWriter formats (.sdw, .sgl, and .vor)AportisDoc (Palm) (.pdb)Pocket Word (.psw)WordPerfect Document (.wpd)WPS 2000/Office 1.0 (.wps)DocBook (.xml)Ichitaro 8/9/10/11 (.jtd and .jtt)Hangul WP 97 (.hwp).rtf, .txt, and .csv
... which is not a bad list (just the word processing bit, too)... and maybe extended in more up to date versions. For interest their FAQs have a question "Why does OpenOffice not support the file format my application uses?"
"There may be several reasons, for example:
- The file formats may not be open and available.
- There may not be enough developers available to do the work (either paid or volunteer).
- There may not be enough interest in it.
- There may be reasonable, available workarounds."
Making legacy file formats more open was the subject of my previous post, and I guess we have to wait and see. But there are plenty of legacy word processing formats not on that list (Samna, for example, later to evolve into Lotus Word Pro, as well as formats for obsolete computers like the Atari, such as the German word processor SIGNUM, supposedly very good for mathematical formulae). What about earlier version of MS Word? Wikipedia lists a bunch of word processors; there must be many documents in obscure locations in these formats.With a concerted effort, we could gradually build OpenOffice input filters for these obsolete document types, thus brining them into the preservable digital world. And this is an effort that could bring in that extraordinary community of enthusiasts who do so much to build document converters and other kinds of software, so much ignored by the digital preservation community!