Semantic web on the Today programme
9 July, 2008
Turning on the radio this morning, I was surprised to hear someone discussing data on the Today programme on BBC Radio 4. It turned out to be Sir Tim Berners Lee talking about the semantic web; he even managed to mention RDF and HTML without confusing the interviewers too much. The interesting 8 and a half minute discussion is available via the BBC iPlayer.
I'm quite keen on understanding better how the semantic web might relate to science data. When people talk about data in relation to the semantic web, they often seem to be thinking the sort of relatively unitary facts or simple relationships that RDF triples are quite good at expressing, such as the population of China is 1,330,044,605. It's certainly not clear to me how this generalises to express the changing population, let alone how it could express the data from a spectrometer or
a crystal structure determination or remote-sensing data. If anyone can point me to a good resource discussing this, I would be grateful!
I'm quite keen on understanding better how the semantic web might relate to science data. When people talk about data in relation to the semantic web, they often seem to be thinking the sort of relatively unitary facts or simple relationships that RDF triples are quite good at expressing, such as the population of China is 1,330,044,605. It's certainly not clear to me how this generalises to express the changing population, let alone how it could express the data from a spectrometer or
a crystal structure determination or remote-sensing data. If anyone can point me to a good resource discussing this, I would be grateful!
- Home
- Digital curation
- About us
- News
- Events
- Resources
- Briefing Papers
- Introduction to Curation
- Annotation
- Appraisal and Selection
- Curating Emails
- Curating e-Science Data
- Curating Geospatial Data
- Data Accreditation
- Data Citation and Linking
- Data Protection
- Database Archiving
- Digital Repositories
- Freedom of Information
- Genre Classification
- Interoperability
- Persistent Identifiers
- Trust Through Self Audit
- Using OAIS for Curation
- Web 2.0
- What is Digital Curation?
- Making the Case for RDM
- Research Data Readiness
- Legal Watch Papers
- Standards Watch Papers
- Technology Watch Papers
- Introduction to Curation
- How-to Guides
- Curation Reference Manual
- Peer review
- Editorial Board
- Completed chapters
- Appraisal and Selection
- Archival Metadata
- Archiving Web Resources
- Curating Emails
- File Formats
- Investment in an Intangible Asset
- Learning Object Metadata
- Metadata
- Ontologies
- Open Source for Digital Curation
- Preservation Metadata
- Preservation Strategies
- Principles for Enabling Access to Engineering Design Information Through Life
- The Role of Microfilm in Digital Preservation
- Chapters in production
- Curation Lifecycle Model
- Policy and legal
- Data Management Plans
- Tools
- Case studies
- Repository audit and assessment
- Standards
- Publications and presentations
- Roles
- Curation journals
- Informatics research
- External resources
- Briefing Papers
- Training
- Projects
- Community

Comments
This doesn't quite answer your question, but I hav...
He expands on your point about the challenges to the semantic web vision of a world not bounded in simple, atomic relationships.
You can represent pretty much anything in RDF, tho...
For something like your China population example, obviously that number keeps changing, so what you are really talking about is some kind of observation or measurement of the population, taken at a particular time using a particular method.
So you could have some RDF statements like (subject : property : value)
measurement1 : population of china : 1,000,000,000
measurement1 : date : 2008-07-10
measurement1 : method : national census
measurement1 : carried out by : Chinese government
In graph terms, this is a single item, "measurement1" with a number of 'spokes' coming out from it for the different properties. Sometimes you need a more complicated structure, using blank or linking nodes. There's always lots of different ways to do it, like any kind of data modelling, and the way you choose depends on the purpose. Supposing you were looking at a table of population info, along the lines of:
country population date
China 1,300,000,000 2006
Australia 20,000,000 2007
China 1,000,000,000 1997
In this case the population is a property of China, but the date is a property of the measurement of the population. You need some kind of intermediate node in your graph, so you can have a set of RDF statements like this. (Shame you can't easily draw diagrams in blog comments).
China : population : measurement1
measurement1 : value : 1,300,000,000
measurement1 : date of observation : 2006
Australia : population : measurement2
measurement2 : value : 20,000,000
measurement2 : date of observation : 1997
You could also be a bit more specific about what these observations are:
measurement2 : type_of : measurement
I set up a company 6 months or so ago to try to tackle some of the issues around more effective collaborating around data. There's some info about it at http://www.swirrl.com if you are interested. We're aiming initially more at business applications than science, but many of the principles are the same, and because of my background I am always interested in the potential scientific applications. I certainly think that you could use RDF effectively for storing science data.
Sorry, this comment ended up being rather long! But hopefully useful.