Wikiproteins...
28 May, 2008
Genome Biology has an article by Barend Mons, Michael Ashburner et al: "Calling on a million minds for community annotation in WikiProteins". From the abstract:
Back at Wikiproteins, the idea is to combine the two approaches (manual curation by experts and sophisticated text mining). Jimmy Wales of Wikimedia Foundation is one of the authors of the paper, which adds an interesting dimension. The approach is based on "a software component called Knowlets™. [...] Scientific publications contain many re-iterations of factual statements. The Knowlet records relationships between two concepts only once. The attributes and values of the relationships change based on multiple instances of factual statements (...), increasing co-occurrence (...) or associations (...). This approach results in a minimal growth of the 'concept space' as compared to the text space..."
This is extraordinarily interesting, and I'm sure we'll hear much more about it in the near future. I particularly like the approach to expert-based quality control. There must be questions about long term sustainability, both organisationally and technically, but sceptics continue to be amazed at the sustainability of other kinds of Open activities!
"WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. "I'll say just a bit more on the Wikiproteins effort below, but I was also interested in this from the introduction:
"The exploding number of papers abstracted in PubMed [...] has prompted many attempts to capture information automatically from the literature and from primary data into a computer readable, unambiguous format. When done manually and by dedicated experts, this process is frequently referred to as 'curation'. The automated computational approach is broadly referred to as text mining."I've been increasingly concerned recently to understand better the use of the word curation in this sense, which dates back to at least 1993, preceding our use of the term by a decade (eg 'curated databases' in genomics, etc). We try to cover this sense through the 'adding value' part of our definition ("Digital curation is maintaining and adding value to a trusted body of digital information for current and future use"), although I'm not sure it captures it fully.
Back at Wikiproteins, the idea is to combine the two approaches (manual curation by experts and sophisticated text mining). Jimmy Wales of Wikimedia Foundation is one of the authors of the paper, which adds an interesting dimension. The approach is based on "a software component called Knowlets™. [...] Scientific publications contain many re-iterations of factual statements. The Knowlet records relationships between two concepts only once. The attributes and values of the relationships change based on multiple instances of factual statements (...), increasing co-occurrence (...) or associations (...). This approach results in a minimal growth of the 'concept space' as compared to the text space..."
This is extraordinarily interesting, and I'm sure we'll hear much more about it in the near future. I particularly like the approach to expert-based quality control. There must be questions about long term sustainability, both organisationally and technically, but sceptics continue to be amazed at the sustainability of other kinds of Open activities!
- Home
- Digital curation
- About us
- News
- Events
- Resources
- Briefing Papers
- Introduction to Curation
- Annotation
- Appraisal and Selection
- Curating Emails
- Curating e-Science Data
- Curating Geospatial Data
- Data Accreditation
- Data Citation and Linking
- Data Protection
- Database Archiving
- Digital Repositories
- Freedom of Information
- Genre Classification
- Interoperability
- Persistent Identifiers
- Trust Through Self Audit
- Using OAIS for Curation
- Web 2.0
- What is Digital Curation?
- Making the Case for RDM
- Research Data Readiness
- Legal Watch Papers
- Standards Watch Papers
- Technology Watch Papers
- Introduction to Curation
- How-to Guides
- Curation Reference Manual
- Peer review
- Editorial Board
- Completed chapters
- Appraisal and Selection
- Archival Metadata
- Archiving Web Resources
- Curating Emails
- File Formats
- Investment in an Intangible Asset
- Learning Object Metadata
- Metadata
- Ontologies
- Open Source for Digital Curation
- Preservation Metadata
- Preservation Strategies
- Principles for Enabling Access to Engineering Design Information Through Life
- The Role of Microfilm in Digital Preservation
- Chapters in production
- Curation Lifecycle Model
- Policy and legal
- Data Management Plans
- Tools
- Case studies
- Repository audit and assessment
- Standards
- Publications and presentations
- Roles
- Curation journals
- Informatics research
- External resources
- Briefing Papers
- Training
- Projects
- Community

Comments
Chris, Back in the mid-1990s JISC's eLib programme...