You are here
ARROW Repositories day: 2
Lynda Cheshire speaking as part of “the researcher’s view”, talking about the view from a qualitative researcher working with the Australian Social Science Data Archive (ASSDA), based at ANU, established 1981, with about 3,000 datasets. Most notable studies election studies, opinion polls and social attitudes surveys, mostly from government sources. Not much qualitative data yet, but have grants to expand this, including new nodes, and the qualitative archive (AQuA) to be at UQ. Not just the data but tools as well, based on existing UK and US qualitative equivalents.Important because much qualitative data is held by researchers on disk, in filing cabinets, lofts, garages, plastic bags! Archiving can support re-use for research, but also for teaching purposes. Underlying issues (she says) are epistemological and philosophical. Eg quantitative about objective measurements, but qualitative about how people construct meaning. Many cases (breadth) vs few cases (depth). Reliability vs authenticity. Detached vs involved.Recent consultation through focus groups: key findings included epistemological opposition to qualitative archiving (or perhaps re-use), because of loss of context; data are personal and not to be shared (the researcher-subject implied contract); some virtues of archiving were recognised; concerns about ethical/confidentiality challenges; challenges of informed consent (difficult as archiving might make it harder to gather extremely sensitive data, but re-use might avoid having to interview more people about traumatic events); whose data is it (the subject potentially has ownership rights in transcripts, while the researcher’s field notes potentially include personal commentary); access control and condition issues; additional burden of preparing the data for deposit.The task ahead: develop preservation aspects (focus on near retirees?), and data sharing/analysis under certain conditions. Establish protocols for data access, IPR, ethics etc. Refine ethical guidelines. Assist with project develop to integrate this work.Ashley Buckle from Monash on a personal account of challenges for data-driven biomedical research. Explosion in amount of data available. Raw (experimental) data must be archived (to reproduce the experiment). Need for standardised data formats for exchange. Need online data analysis tools to go alongside the data repositories. In this field, there’s high throughput data, but also reliable annotation on low volume basis by expert humans. Federated solutions as possible approaches for Petabyte scale data sets.Structural Biology pipeline metaphor; many complex steps involved in the processes, maybe involving different labs. Interested in refolding phase; complex and rate-limiting. They built their own database (REFOLD), with a simple interface for others to add data. Well-cited, but few deposits from outside (<1%). Spotted that the database was in some ways similar to a lab note-book, so started building tools for experimentalists, and capture the data as a sideline (way to go, Ashley!). Getting the data out of journals is inadequate. So maybe the journal IS the database? Many of the processes are the same.Second issue: crystallography of proteins. Who holds the data? On the one hand, the lab… but handle it pretty badly (CDs, individuals’ filestore, etc). Maybe the Protein Data Bank? But they want the refined rather than the raw data. Maybe institutional libraries? TARDIS project providing tools for data deposit and discovery, working with ARCHER, ARROW and Monash library... This field does benefit from standards such as MIAME, MIAPE etc, which are quite important in making stuff interoperable. Ashley's working with Simon Coles etc in the UK (who's mostly at the small molecule end).So how to go forward? Maybe turning these databases into data-oriented journals, with peer review built in etc would be a way to go? Certainly it's a worry to me that the Nucleic Acids field in general lists >1,000 databases; there has to be a better way than turning everything into Yet Another Database...