Interview with Bob Mann, Wide Field Astronomy Unit, University of Edinburgh

Bob Mann is a lecturer in the School of Physics, University of Edinburgh. He is a member of the Wide Field Astronomy Unit (WFAU) [external], which is a part of the School's Institute for Astronomy. WFAU is one of the UK's main astronomical data centres, specialising in the curation of optical and near-infrared sky surveys. It is also part of AstroGrid [external], which is the UK's contribution to the international initiative to create a global 'Virtual Observatory', federating the world's astronomical data resources.

  1. What does Digital Curation mean for you?
  2. How do you do it for your data?
  3. Have you considered the OAIS model, and, if so, has it been useful for you?
  4. How long is "long-term" preservation for your research data?
  5. How will your digital curation be funded?
  6. How long is your funding horizon?
  7. What tools are in use now to store data?
  8. What tools are in use now to import data?
  9. What tools are in use now to locate data?
  10. What tools are in use now to retrieve data?
  11. What tools are needed for digital curation in addition to current tools? Or instead of current tools? Which are most important, most needed?
  12. How long before current hardware/software will be replaced?
  13. What standards are in use or needed? You mentioned that the Reference Model for an Open Archival Information System (OAIS) is being discussed in the VO community, and that this model may play a role in your work in the near future. Are there any other standards you are considering? Are any guidelines (e.g., for data creators) needed? What about metadata schemas? Of all these things, which are most important, most needed?
  14. What file formats are used in astronomy? Could you list them and highlight any issues such as: which file formats are the most important and why, are there any proprietary formats, and what changes do you foresee?
  15. You mentioned that 'the volume of astronomical data available online doubles every twelve months or so'. Could you provide an estimate (however rough) of the quantities of data being managed (at any convenient point in time over the last year or so) in terms of total volume of data, total number of files?
  16. Does FITS (or do other file formats) present challenges now? You've talked about data curation extending to include the provision of a data analysis environment at the data centre, the need to provide 'computationally safe environments within which users' uploaded analysis code can run' and the concomitant need for provenance tracking. You also champion the idea that curation needs to start on the scientist's desktop, another area where provenance tracking is important. Are there any other challenges you would like to highlight? If so, what are the challenges? How is the astronomy community addressing these challenges? What challenges do you foresee in future?
  17. It seems to me that astronomy is a prime example of a field where observational data is routinely made available for reuse, or for shared use. On the other hand, you said data created through analysis of observational data is not routinely available except in conjunction with journal publication and suggested that it might be made available through the new pre-print or in other ways.
  18. Could you say something about how is that data is or might be reused? Would it be used in new studies or simply to validate work that produced it?
  19. Would I be correct in thinking that astronomical data is as likely, or perhaps more likely, to be read by programs as by people?
  20. What will be needed in 50 years to re-use astronomical data?
  21. What do you need that you have not got to do digital curation for your data, and that DCC might help to provide? You mentioned earlier that you want the DCC to research the provision of computationally safe environments with which users' uploaded analysis code could be run.
  22. Are other areas of research needed?
  23. How else might DCC support your work? Services, e.g. advice or others or other services?
  24. Registries?
  25. Professional development opportunities?

Q1. What does Digital Curation mean for you?

To my mind, the term "digital curation" denotes looking after digital data in a manner that facilitates their discovery and use by interested parties. This encompasses a number of activities, from the physical preservation of the data in a form which remains readily accessible using current technologies, to the preparation and maintenance of metadata describing data holdings and their provenance, to the development and deployment of search tools which enable users to find relevant data and delivery mechanisms for making those data available to them in a convenient form.

In an astronomical context, the curation of data may involve their reprocessing or otherwise amending and updating them in the light of developing knowledge of the instrument which produced them. This often requires a detailed understanding of the data, which is why curation is best performed by specialist data centres, with established expertise related to the particular type of data, and which therefore precludes the simple dumping of data into a general repository.

Q2. How do you do it for your data?

WFAU's activities centre on the curation of a series of sky survey archives; in astronomy, the word "archive" is often applied to a new dataset, rather than denoting the final resting place of no longer active data, as is the case in some other disciplines.

The sky surveys WFAU curates are mainly produced by instruments working in the optical or near-infrared regions of the electromagnetic spectrum. The basic data product in the optical/near-infrared is an image, from which discrete sources (stars, galaxies, etc) are extracted. Most astronomical research is conducted using the catalogues of attributes describing these extracted sources, but, in some cases, the astronomer wants to go back to the original image, so we store both images and catalogues, and serve both to users.

These data are stored on RAID arrays, with back-up copies on removable media for further security. As sky survey catalogues have become larger (the current ones are several Terabytes in size), we have started using relational databases to store them and their metadata. The image files remain stored in flat file systems on disk, but with their metadata stored in a database.

Users access our data via WWW-based tools, but, with the coming of the VO, access will increasingly be made by analysis programmes and software agents, rather than by astronomers sitting at a computer. To aid the discovery and proper use of our data we provide quite a lot of metadata, including provenance information which traces the lineage of data products back to their parent observations, and which records the operations applied along the way.

Q3. Have you considered the OAIS model, and, if so, has it been useful for you?

The OAIS model has not played a role in our work to date, but may do so in the near future. Until recently, while we have just been serving our own data to users via our own WWW site, such a model has not seemed necessary to us, but, with the advent of the Virtual Observatory (VO), we see the value of abstractions which can form the basis for practical standards, and it may well be that we come to use the OAIS reference model in earnest soon; it is certainly being discussed within the VO community

Q4. How long is "long-term" preservation for your research data?

"Long-term" certainly means "for several decades" in astronomy. Whilst technological advances mean that better data are being produced by new instruments all the time, some astrophysical phenomena are time-dependent on timescales of decades — nearby stars exhibit appreciable motions across the sky, transient objects appear and/or disappear — so that even old data taken with an outdated instrument can still be useful, and record an unrepeatable observation. That is the reason why we in WFAU have bothered to digitise thousands of photographic sky survey plates, even though new digital detectors are producing superior data over the same regions of the sky now — our data complement these newer data, rather than being simply superseded by them.

Q5. How will your digital curation be funded?

WFAU is supported by grant funding from the Particle Physics and Astronomy Research Council (PPARC). Most of this is connected with specific sky survey projects, rather than ongoing core funding for long-term digital curation. PPARC is currently conducting a review of data curation, and it has been pointed out that such activities must be supported if the long-term scientific exploitation of the data holdings from the projects it funds is to be possible.

Q6. How long is your funding horizon?

Currently WFAU's grant is renewed every two years or so. The archive development projects we are currently starting or planning are in support of sky surveys which will operate for the next 10-15 years, so that is the timescale that frames our work, although we do not have secure funding extending so far into the future.

Q7. What tools are in use now to store data?

As I mentioned in answers to the questions in Interview 1, the data comprising the optical/near-infrared sky surveys with which I am most familiar are of two types: images and catalogues of attributes describing discrete objects (stars, galaxies, etc) detected in those images. These two types of data are stored in different ways. The images tend to be left as files (in FITS format) on disk, while the catalogue attributes tend to be loaded into a database management system — nowadays usually a Relational Database Management System (RDBMS).

Q8. What tools are in use now to import data?

Most RDBMSs have their own data import tools, so importing the catalogue data into the database is simply a matter of converting the data into whatever format is required by the RDBMS' data import tool and running it on the data.

Q9. What tools are in use now to locate data?

This is one area where a major shift is underway with the advent of the Virtual Observatory (VO). In the pre-VO world one really needed to know which data centre held the data one was after, and then would expect to have to use some web-based tool to locate the data within the holdings of that data centre, on the basis of some relevant metadata. In practice, this works pretty well, since astronomy is a relatively small discipline and there are, therefore, a relatively small number of major data centres, so one usually knew where to look.

One of the things the VO is supposed to do is to make it easier for researchers to publish data, so it is expected that the VO will contain many more data sources than are currently available; many of these may be very small in comparison to the established data centres — a data set published by a single astronomer or small research group to accompany publication of a paper — but the plan from the start was to aid the automated location of relevant data. So, the VO will feature a registry (probably, in fact, a distributed network of registries, which keep each other updated) which records metadata describing all data published to the VO. There is some debate as to how detailed these metadata should be, i.e. should they be sufficiently detailed that I know how to use these data on the basis of the registry entry, or do I expect to have a second round of interaction with the data source itself to figure that out? But the registry will aid the location of relevant data.

Q10. What tools are in use now to retrieve data?

The term retrieval could mean selection or transport of data; I'm assuming you mean the latter here.

One thing to note is that a maxim in the VO community is "ship the results, not the data". In the pre-VO world, an astronomer would extract data products from an archive and analyse them on a local workstation. As our sky surveys are getting larger and larger, it is becoming increasingly less desirable for the astronomer to download the data to a local machine, and, instead, the intention is that analysis programs should be run at the data centre, and it is only the eventual result set which is sent to the user, rather than the (presumably much larger) data set from which it was derived. This introduces interesting issues for the data curators, since their role has become expanded: not only are they looking after data, but they have to do so in a way that allows people to analyse them in the data centre, e.g. by providing computationally safe environments within which users' uploaded analysis code can be run. This is difficult, and is one of the topics that I want the DCC to research.

So, to return to the question; we are trying to minimise the retrieval of data (by volume). Where we do have to transport data it will typically be done using standard data transfer protocols, such as ftp, gridftp, and so on.

What tools are in use now to package data to send out?

I don't think we do too much of that. Much astronomical data is stored as FITS files. FITS stands for Flexible Image Transport System, and was originally designed as a format for exchanging image data between observatories. It has evolved into a general standard data format, and there are flavours of it for storing images, spectra, tabular data, and so on — pretty much all classes of astronomical data. The packaging of data is often little more than gzipped the appropriate FITS file — or, possibly, if there are multiple files, tarring the directory containing them and then gzipping it.

The VO community has been experimenting with an XML format for tabular data — called VOTable — and many prototype VO web services exchange data by having a VOTable put into the payload of a SOAP message.

Q11. What tools are needed for digital curation in addition to current tools? Or instead of current tools? Which are most important, most needed?

As noted above, a VO registry of data sources will be needed and is in hand — at least in a preliminary form. It is likely that a more sophisticated registry will be necessary in the longer run, presumably one based on a more sophisticated data model, e.g. an astronomical ontology, relating the different quantities we deal with.

If the definition of curation is taken as extending to include what I described about the provision of a data analysis environment at the data centre, then there is a definite need for the tools to implement that.

We could probably also benefit from better provenance tracking tools, especially if the notion of data curation is taken (as I think it should be) beyond the data centre and to the scientist's desktop. It is really desirable to see the tracking of provenance continue through the analysis process undertaken by the scientist, so that whenever the scientist comes up with some result, its lineage is known, thereby implying that the result is reproducible. In the VO world, it is anticipated that astronomers will often undertake data analysis via the specification of workflows, so we would like to have provenance tracking added into workflow tools.

Q12. How long before current hardware/software will be replaced?

The volume of astronomical data available online doubles every twelve months or so, so hardware is usually not simply replaced but supplemented by new hardware to keep up with the ever-expanding volume of data. This process is continual, but I guess that we assume that hardware needs replacing every 3-5 years, although, as I say, the ever-increasing data volumes involved tend to obscure the simple hardware replacement cycle. Similarly software tends to be amended, rather than replaced, but, if you think that projects have, typically, five year timescales, that's probably a reasonable figure for the software replacement cycle, as each project has slightly different requirements.

Q13. What standards are in use or needed? You mentioned that the Reference Model for an Open Archival Information System (OAIS) is being discussed in the VO community, and that this model may play a role in your work in the near future. Are there any other standards you are considering? Are any guidelines (e.g., for data creators) needed? What about metadata schemas? Of all these things, which are most important, most needed?

The VO has its own standards agency, called the International Virtual Observatory Alliance (IVOA). As its name suggests, the IVOA brings together all the national and regional VO projects, and provides the forum within which the specifications for standards required within the VO can be developed, and ratified. The IVOA tries to encourage re-use of existing standards where they are appropriate, so, for example, the IVOA resource metadata specification uses concepts from Dublin Core where it can, and only supplements them with astronomy-specific ones if needed, e.g. as well as needing to be able to specify the owner and author of a data resource, we also want to describe its spatial and spectral coverage, since identification of a relevant data set is usually based on its location in the sky and the region of the spectrum that it covers.

The IVOA should eventually define all the metadata schemas required for the VO, as well as standards for data formats, the authentication of users, etc. It's doing this fairly well, but it's early days.

Q14. What file formats are used in astronomy? Could you list them and highlight any issues such as: which file formats are the most important and why, are there any proprietary formats, and what changes do you foresee?

Within celestial astronomy — or "the dark side", as the solar physicists like to call it — Flexible Image Transport System (FITS) really is the major data format, with VOTable developing through its use in Virtual Observatory projects.

FITS has become the most important through gradual adoption by the major players in the field — US national observatories, NASA data centres, and so on. It is also quite a good standard, in that it is fairly well specified and has a well defined (if glacially slow) procedure for its own evolution. A community has grown up around FITS which has been prepared to put a lot of effort into the FITS specification, which means that it meets most user needs. It is good that it contains metadata — and was quite early in doing so — but the way that the metadata is specified is a bit of a problem, in that there is no fixed schema, and, while there are conventions for the names of keywords, it is formally up to the creator of the FITS file to choose the name of each keyword, which means that the metadata records can be somewhat impenetrable to anyone bar the creators and associated experts, and it is difficult to use FITS files from different sources interoperability.

VOTable is important because it is used in the VO community, but only in prototype/testbed activities. I don't think people are really using it in anger at the moment. Many people feel that, when it is, it will be found wanting, due to the verbosity of XML, which is clearly a problem in domains like astronomy, where datasets can be large. The creators of VOTable understood that, so they made allowance in the specification for either (i) a link to an external binary format data file or (ii) the encoding of binary data within the XML file, but neither of these have been used seriously yet.

I think one thing that will have to happen is the development of a more compact data representation and one possibility would be the use of BinX from the eScience Data Information and Knowledge Transformation (eDIKT) project [external] based at the National e-Science Centre (NeSC). BinX is a language for describing binary data files using XML, together with a toolkit for manipulating such files. So, BinX enables one to take a large data file and store the bulk of it in a compact binary form, while maintaining an XML description of it, which makes it much easier to manipulate.

The Centre de Données astronomiques de Strasbourg (CDS) led the development of a standard format — called the ReadMe (see Standard for Documentation of Astronomical Catalogues [external]) — for describing catalogue data files, but it's not used too widely, since there aren't many other data centres beside CDS themselves who collect large numbers of catalogues. Proprietary formats are not commonly used in astronomy.

Q15. You mentioned that 'the volume of astronomical data available online doubles every twelve months or so'. Could you provide an estimate (however rough) of the quantities of data being managed (at any convenient point in time over the last year or so) in terms of total volume of data, total number of files?

My guess would be that the total amount of astronomical data being managed around the world is several hundred terabytes, or maybe even a Petabyte. In terms of numbers of files, that's probably a few hundred thousand.

Q16. Does FITS (or do other file formats) present challenges now? You've talked about data curation extending to include the provision of a data analysis environment at the data centre, the need to provide 'computationally safe environments within which users' uploaded analysis code can run' and the concomitant need for provenance tracking. You also champion the idea that curation needs to start on the scientist's desktop, another area where provenance tracking is important. Are there any other challenges you would like to highlight? If so, what are the challenges? How is the astronomy community addressing these challenges? What challenges do you foresee in future?

I think the cultural challenges are likely to be harder to address than the technical ones, in general. The first cultural challenge will be for astronomers to get used to doing research by means other than downloading data products to their workstation and analysing them locally.

I think there will also be a cultural challenge related to data publication. Astronomers are still used to viewing the refereed journal paper as what "publication" means. That view is changing slightly, since the advent of the astro-ph preprint server (mirrored at uk.arxiv.org) means that people read preprints not the final, published versions of papers, but there's no cultural norm for the publication of data per se: typically a data release by a project team is accompanied by a journal paper – and maybe that is the correct way to go, still, but there will certainly be many opportunities for more informal publication of data in the era of the Virtual Observatory, and it's not clear how they will be received.

Another cultural challenge which is being met is the challenge of making astronomical databases interoperable. Most of them were designed and developed individually, with no expectation that they would be used in concert with any others, so the interoperability has had to be retro-fitted, but this procedure is actually going pretty well, under the aegis of the IVOA.

Amongst technical challenges, there is definitely one of scale. A number of the groups archiving the current generation of sky surveys have had a baptism of fire when suddenly faced with multi-TB datasets to curate and serve to users. Luckily, we've been learning from each others' mistakes, which simplifies the process overall, but this rapid expansion in data volumes has meant that data centres have had to develop rapidly expertise in hardware and software issues to which they had had no previous exposure.

This has had serious consequences for the attraction and retention of staff with the correct skills. People with the necessary level of IT expertise are not going to work for the kind of salaries that research councils offer in grant funding. A related point is that the data centres are becoming more specialised as they become more expert, and I think there is a need for research councils to accept that there really is a role for professional data curation in their domains, and that that can't be done on the cheap by postdoctoral researchers who are amateurs in the game — it really does need dedicated teams of professionals to be built up and maintained.

Q17. It seems to me that astronomy is a prime example of a field where observational data is routinely made available for reuse, or for shared use. On the other hand, you said data created through analysis of observational data is not routinely available except in conjunction with journal publication and suggested that it might be made available through the new pre-print or in other ways.

Yes, that's right. Astronomy has long been good about making raw observational data public; most astronomical data are proprietary to the people who proposed the observations for a period of a year or two and then are made available to the world. This used to be less useful than it might sound, since there is often quite a bit of standard data processing that has to be performed to get from the raw data to the "science-ready" data products with which one actually does research, but there's a trend that archives should make available basic data products as well as the raw data, which makes re-use much easier.

It is, as you say, unusual for derived data generated during scientific analyses to be published — they tend just to be described in papers or preprints — but I think that may change as the Virtual Observatory makes data publication easier.

Q18. Could you say something about how is that data is or might be reused? Would it be used in new studies or simply to validate work that produced it?

Astronomers do little in the way of validation of other people's results by simple reanalysis of their data. In some cases — highly controversial results and/or high-profile datasets — this is done, but not often. Maybe we should do more, but it's difficult to get that funded or published.

So, it is largely reuse in new studies. Most reuse is likely to be that of data products, rather than derived data products, since typically the derived data are more specific to particular analyses, but some reuse of such data is possible.

One good example of data which are often reused — and should be reused — are collations of multi-wavelength properties of samples of objects. One of the main science drivers behind the Virtual Observatory is the need to do multi-wavelength astronomy. Historically, astronomy as a discipline was divided by wavelength — one was either an optical astronomer, an X-ray astronomer, a radio astronomer — but the trend is for less specialisation, as people realise that they need data from many regions of the spectrum to understand certain types of object properly.

So, much of my own research time in the recent years has been spent collating data about samples of objects from different archives. This collation activity is not straightforward, since often it is not easy to work out, for example, which entry in an optical catalogue matches which entry in an X-ray catalogue, so there is value in caching those associations when they have been made, since they are likely to be useful in future analyses. This is recognised by the fact that one can write journal papers presenting and discussing associations between different catalogues, but it would be good if these associations were made available more systematically, and in a machine-readable form. Some biologists developed a system called the Distributed Annotation Service, which is an infrastructure for making available third-party annotations to gene sequences held in databases, and I think there is a place for a Distributed Association Service in astronomy, which caches associations made between entries in different databases in a way that enables their reuse.

Q19. Would I be correct in thinking that astronomical data is as likely, or perhaps more likely, to be read by programs as by people?

Increasingly so, yes; many tasks which would once have been done my visual inspection are now being automated, as the data volumes increase.

Q20. What will be needed in 50 years to re-use astronomical data?

Aside from the physical preservation of the data on readable media, the key will be the availability of metadata describing the provenance of the data. Clearly, the instruments that take data have a finite lifetime, and I think there is an issue about how you preserve enough of the knowledge of the quirks of each instrument to enable its data to be used when all the people who built it and used it are dead.

As I mentioned above, there's usually a fair amount of data processing required to get from raw observational data to what I called science-ready data products. Essentially this is removing the instrumental signature from the data, i.e. the artefacts introduced into the data because it doesn't record the radiation incident on it perfectly. One always hopes that the science-ready data products really have removed all the instrumental signature, so one can start with them, but, occasionally — and especially when one is trying to squeeze information out of data at the limit of their capabilities — one has to go back to the raw data and/or assess in great detail whether there are any artefacts remaining in the data products. This suggests that it will not be sufficient to keep only the data products, but also the raw data and some record of the knowledge of the instrumental signatures found in them and how they arise.

Q21. What do you need that you have not got to do digital curation for your data, and that DCC might help to provide? You mentioned earlier that you want the DCC to research the provision of computationally safe environments with which users' uploaded analysis code could be run.

That looks like the hardest problem to me, as I look ahead to how I think WFAU data will be used in the coming 5–10 years. Associated with that are issues like how to allow users to set up temporary tables of intermediate results in databases at data centres without risking the integrity of the data centre, etc.

Q22. Are other areas of research needed?

As I mentioned in one of the topics, I would also like to see the development of annotation services which aid data integration. I gave the example of a Distributed Association Service, which would cache matches made between entries in different astronomical databases, and, hence, make it easier for them to be queried together, but I assume that analogous services would be useful in other areas where data integration is not trivial. For example, I believe that there are major problems in many people-centred databases (whether it be medical records or commercial customer analysis) due to the fact that people move, use alternative forms of their names in different situations (e.g. Bob v Robert) and have their names spelled wrongly, and such cached annotation services to aid data integration might be generally useful.

Another area where I see a need for research is in the kinds of information that can be captured in provenance records. Astronomers are reasonably good at capturing provenance within the pipeline processing systems that data centres use to generate science-ready data products from raw observational data. For example, the Flexible Image Transport System (FITS) file format has a "HISTORY" metadata keyword which allows the storage of provenance metadata about a data file in its header, and many data centres enter comprehensive sets of HISTORY records in the headers of the FITS files they produce. There are, however, several problems with this. Firstly, the HISTORY records are in free-text format, so they're not really machine-readable. Secondly, the information that data centres tend to record are simply a series of entries composed of the name of a data processing program and the input values it was run with. This is fine for use within the data centre, by people who know what each program does, but it appears very cryptic to end-users.

So, what would be ideal would be a way of recording provenance information which was machine- readable and more readily understood by people other than those intimately connected with its creation. The solution to the latter issue could lie in the use of ontologies, if, instead of just recording the name of the program that performed a particular step in a data processing chain, it was possible to record the semantic types of the input and outputs from that step. That in itself would not be sufficient to understand fully what that program did, since, clearly, a given set of inputs could be converted in a given set of outputs in an infinite number of ways, but it would be a start.

I understand that the development of an ontology for a given domain is a difficult and time-consuming business, but it would be useful if the DCC could provide some sort of introduction, or framework that could help people do that, since I'm sure that the enabling of data integration will become an increasingly important part of data curation, and ontologies must surely have a role in that, by aiding the development of a global schema across multiple data sources.

Another area of research that would be of great benefit to astronomy is in data format description to aid data conversion. I've been working with the eDIKT team at the National e-Science Centre (NeSC) on an application of their BinX language for the description of binary data files, and that looks very promising, as does the work within the Global Grid Forum to draw up a generic Data Format Description Language (DFDL). I would like to see the DCC aiding the development of DFDL if possible, feeding requirements from its user community, and so on.

Q23. How else might DCC support your work? Services, e.g. advice or others or other services?

Our experience in WFAU in the past few years is that we've been hit quite rapidly by greatly increasing curation responsibilities, both from the rapid growth of the volume of data we look after and the expanding range of services we shall have to offer to help people exploit our data effectively. In many cases, we've come up against problems which are new to us, but which must have been solved many times over by other people — whether it be by the digital libraries community, or within other academic disciplines — and we've not known how to find out what has already been done. If the DCC can provide a route to accessing previous solutions for generic problems that would be a great help to many people I think. I guess that requires a combination of a repository of best practice and the kind of forum proposed for the Associates Network, which would facilitate peer-to-peer communication and enable us to reach people who have relevant expertise.

One thing that would be helpful would be advice on technical data management issues, e.g. the correct way to configure hardware for large databases. That's a rather specialised topic, which is probably beyond the experience of many of the system administrators who are starting to have to support large-scale data curation activities in research groups, so it would be good to have some expert advice to call on, to prevent wholesale wheel-reinvention in too many places.

Q24. Registries?

There has been quite a lot of work on registries for astronomical data resources within the Virtual Observatory community, but it would be good if the DCC and its international analogues could lead some sort of standardisation effort in that area, so that different disciplines can use standard tools and implementations, rather than have to develop their own from scratch. Clearly, the information which has to be stored in registries will vary greatly between disciplines, but some standard built on Dublin Core, and with a well defined mechanism for extending its schema for use in particular domains would be generally useful, I'm sure.

Q25. Professional development opportunities?

That's not something I think we have thought about much to date, but we probably should do, as the data curation side of our work becomes increasingly important. I think professional development goes hand in hand with securing continued funding, since, clearly, nobody will want to develop their expertise as data curators if they don't think they're going to have a job in that area for much longer. I think it would be very good for the DCC to argue the case for the importance of data curation with the research councils and other funding agencies. Maybe this requires some studies of the economic and scholarly benefits of data curation, and the DCC should become the best placed body to guide such research, even if it does not undertake it itself.

The DCC is funded by

Joint Information Systems Committee