What are data?
5 May, 2009
Another nice blog post from Peter Murray-Rust, in his "thinking out loud before a presentation" series, from which I quote:
"Different people would have cutoffs at different points on this hierarchy [CR: Data → Information → Knowledge → Wisdom] but I think the following are fairly common attributes of data:I think I mostly agree with that, although it made me think quite hard, and of course at the extreme anything is data for someone (these words are mere data for Blogger or our RSS aggregator or Google, for example). this can get difficult when your mission is to support research data! Our advice from JISC is to recognise the potential ambiguity, be flexible in accepting others, but take our own view in what we create.Here are some statements which provide data:
- it is [sic] distinct from most prose (although some prose would be better recast as data)
- it is generally a component of a larger information or knowledge structure
- facts and data are closely related
- many data are potentially reproducible or unique observations, are not opinions (though different people may produce different data)
- data, as facts, are not copyrightable.
- Collections of data and annotated data (data + metadata) may have considerably enhanced value over the individual items.
- Data can be processed by machine
and here are some which are not data
- 36 26 38
- Melting Point: 300 K
- The reaction product was red
- my blog page is http://wwmm.ch.cam.ac.uk/blogs/murrayrust
- her work is well respected
- we thank Dr. XYZZY for the crystals
- we find this reaction very difficult to perform
More about
data- Home
- Digital curation
- About us
- News
- Events
- Resources
- Briefing Papers
- Introduction to Curation
- Annotation
- Appraisal and Selection
- Curating Emails
- Curating e-Science Data
- Curating Geospatial Data
- Data Accreditation
- Data Citation and Linking
- Data Protection
- Database Archiving
- Digital Repositories
- Freedom of Information
- Genre Classification
- Interoperability
- Persistent Identifiers
- Trust Through Self Audit
- Using OAIS for Curation
- Web 2.0
- What is Digital Curation?
- Making the Case for RDM
- Research Data Readiness
- Legal Watch Papers
- Standards Watch Papers
- Technology Watch Papers
- Introduction to Curation
- How-to Guides
- Curation Reference Manual
- Peer review
- Editorial Board
- Completed chapters
- Appraisal and Selection
- Archival Metadata
- Archiving Web Resources
- Curating Emails
- File Formats
- Investment in an Intangible Asset
- Learning Object Metadata
- Metadata
- Ontologies
- Open Source for Digital Curation
- Preservation Metadata
- Preservation Strategies
- Principles for Enabling Access to Engineering Design Information Through Life
- The Role of Microfilm in Digital Preservation
- Chapters in production
- Curation Lifecycle Model
- Policy and legal
- Data Management Plans
- Tools
- Case studies
- Repository audit and assessment
- Standards
- Publications and presentations
- Roles
- Curation journals
- Informatics research
- External resources
- Briefing Papers
- Training
- Projects
- Community

Comments
I agree; while the DIKW framework sounds simple at...
I don't know if you read the recent article in the Journal of Info Science on this topic, The Knowledge Pyramid: A critique of the DIKW hierarchy (Fricke, 2009), but it's a nicely related and a thought-provoking piece (if not one I quite agree with).
You make a good point how the categorization of data/info/etc is inherently subjective, and I think you draw the right moral: the system should work for the application, not the other way around.