Because good research needs good data

Open Source Software and Open Standards

What does open source mean?

Put in simple terms, open source is a transparent way of developing software and making it freely available for others to use.

The source code of a program is made fully available for individuals to access, alter and reuse, unlike proprietary software.

What about open standards?

These are related to open source, although not synonymous. By virtue of their transparency and level of community acceptance these are standards that offer a degree of protection against obsolescence and inaccessibility. They must be free to view and implement, prevent users from being locked in to a particular vendor and have no associated royalty or fee.

Examples of open standard file formats include the OASIS Open Document Format and the World Wide Web Consortium's XHTML. These can be contrasted with opaque proprietary alternatives which are not really fully supported elsewhere, and force users to choose a single vendor's software solutions to facilitate information retrieval and update.

Is open source really free?

Yes. This free access manifests itself in terms of the open source software's legal status. It makes the software's free status, and its associated transparency, indistinguishable from the code itself. Unlike most proprietary software, open source licenses are conceived in favour of the end user and permit most kinds of use and redistribution.

What does proprietary software mean?

Normally, software creators produce a program and license it for commercial gain; for a variety of business-oriented reasons they do not make its source code openly available to the public. Most proprietary software is released for use only within the strict terms described in its End User License Agreement. These are likely to limit things like numbers of users, places where software can be executed, operations that can be performed and the rights of users to reverse engineer or emulate functionality.

What is source code?

When a program is initially conceived it's written in a human-readable language (a programming language such as C, C++ or Java). For a computer to efficiently execute the program this source code often undergoes a process called compiling, which produces one or more binary files as its result. Composed of 1s and 0s, these make no sense to human eyes, and are difficult to reuse in other programs or environments. To really understand what a program is doing, how it's behaving and to facilitate its reuse and enhancement, source code is essential.

Can you explain what is source code more fully?

Without access to source code a user will have very limited knowledge of how the program actually works and will be technically unable to alter the program to suit their individual needs, or use the code as the foundation of subsequent programs. Open source differs from normal programming as it is created and distributed by groups of people who are happy (indeed enthusiastic) for it to be used, modified and redeployed.

If source code is so valuable why not just convert proprietary binary files back?

Although the effects of the compiling process can sometimes be reversed using a decompilation tool, the complexity of this process means it's usually one-way in nature (it's comparable with trying to convert a baked cake back into its original ingredients). In any case, irrespective of the technical means one might have to recreate source code from a binary file, many proprietary software licenses expressly forbid it.

What's in it for the open source people?

The spirit of the open source community is such that more transparency within software is a good thing. Developers and distributors are motivated by the belief that free software will enable users to add enhancements, resulting in better programs.

I'm a data creator; how will open source and open standards affect me?

Choosing an open source tool tailored to your requirements will greatly enhance the chances of a digital object's longevity. As so much of data curation involves thinking about the overall life cycle of a digital object, it is crucial to think first about what format in which to create a document or what tools to rely upon. Choosing a proprietary format might well hamper any long-term curation of the object, whereas an open format enables the introduction of preservation procedures into the data creating process. Furthermore, the absence of legal restrictions that determine the ways in which digital assets can be stored and manipulated enables a digital curator to emulate, migrate or reuse software or data with far fewer complications.

I'm a data curator; how will open source affect me?

Once materials have been created in open formats with open source tools it will be easier to tailor the curation process to one's specific needs. A clear understanding can be established of how a digital object was created and structured. In the long-term if digital objects are obtained with no available rendering software, it should be fairly easy to recreate or reuse the digital object provided that the source code has been archived and is sufficiently well documented.

I'm a data re-user; how will open source affect me?

Most immediately, the transparency that characterises the OSS model is of great benefit. In order to effectively reuse or access a digital resource in the future, understanding a program is a great facilitator, and this can be more easily done with full access to the source code. In the long-term open source file formats will greatly enhance accessibility to data. Being dependent on commercial or proprietary software may well restrict access as one will need to obtain the exact same software in order to read particular file formats, and the software, or the company that created it may no longer be around.

Does open source come with any licensing at all?

The open source community recognises a range of individual off-the-shelf open source licenses. In order to qualify, a license must satisfy the open source definition, and offer users the right to freely obtain, use, reuse and distribute licensed code. Some go further, adding what's known as a copyleft clause that stipulates that any subsequent, derivative code must be released under the same open license. Nothing in the open source definition compels distributors to release software for no fee, but anyone who acquires the software may redistribute as widely as they wish without remuneration.

Can I be sure that the open source software will always be maintained and have full functionality?

Provided that the software is both open source and available then it will be possible for any other party to step in and take over the maintenance and development of a particular software project. Proprietary software on the other hand offers few assurances of on-going maintenance, and commercial licenses usually absolve distributors of all responsibility if the software goes wrong.

Does the open source approach have any drawbacks?

The problems associated with open source are well documented by its opponents. Some of the most common include.

What is the future of open source?

Currently, the wide scale adoption of open source software and open formats remains, in comparison with proprietary alternatives, is rather low. The follow-the-crowd mentality is a reasonable strategy for ensuring the longevity of one's resources: if there are enough people with a vested interest in continued support for a particular software product then there are attractive profits to be made by commercial companies from ensuring it remains available.