Because good research needs good data

Linked data and staff contact pages

Chris Rusbridge | 21 January 2010

You may remember that I am interested in the extent to which we should use Semantic Web (or Linked Data) on the DCC web site. After some discussions, I reached the conclusion that we should do so, but the tools were not ready yet (this isn’t quite an Augustinian “Oh Lord, make me good but not yet”; specifically, we are moving our web site to Drupal 6, the Linked Data stuff will not be native until Drupal 7, and our consultants are not yet up to speed with Linked Data). I have to say that not all our staff are convinced of the benefits of using RDF etc on the web site, and I have had a mental note to write more about this, real soon now.

I was reminded of this recently. I wanted to phone a colleague who worked at UKOLN, one of our partners, and I didn’t have his details in my address book. So I looked on their web site and navigated to his contacts page. Once there I copied his details into the address book, before lifting the phone to give him a ring. After the call (he wasn’t there; the snow had closed the office), I thought about that process. I had to copy all those details! Wouldn’t it be great if I could just import them somehow? How could that be? UKOLN have expertise in such matters, so I tweeted Paul Walk (now Deputy Director, previously technical manager) asking whether they had considered making the details accessible as Linked Data using something like FOAF. You can guess I’m not fully up to speed with this stuff, but I’m certainly trying to learn!

Paul replied that they had considered putting microformats into the page (I guess this is the hCard microformat), and then asked me whether my address book understood RDF, or if I was going to script something? I was pretty sure the answer to the second part was “no” as I suspect such scripting currently is beyond me, and told Paul that I was using MacOSX 10.6 Address Book; it says nothing about RDF, but will import a vcard. I was thinking that if there was appropriate stuff (either hCard microformat or RDFa with FOAF) on the page, I might find an app somewhere that would scrape it off and make a vcard I could import.

Paul’s final tweet was: “@cardcc see the use-case, not sure it's a 'linked data' problem though. What are the links that matter if you're scraping a single contact?”

Well, I couldn’t think of a 140-character answer to that question, which seemed to raise issues I had not thought about properly. What are the links that matter? Was it linked data, or just coded data that I wanted? Is this really a semantic web question rather than linked data? Or is it a RDF question? Or a vocabulary question? Gulp!

After some thought, perhaps Paul was as constrained by his 140 characters as I was. Surely a contacts page contains both facts and links within itself. See the Wikipedia page on FOAF for examples of a FOAF file in turtle for Jimmy Wales; the coverage is pretty much like a contacts page.

So Paul’s contact page says he works for UKOLN at the University of Bath, and gives the latter’s address (I guess formally speaking he works in UKOLN, an administrative unit, and is employed by the University); that his position in UKOLN is Deputy Director, that his phone, fax and email addresses are x, y and z. All of these are relationships between facts, expressible in the FOAF vocabulary. With RDFa, that information could be explicitly encoded in the HTML of the page and understood by machines, rather than inferred from the co-location of some characters on the page (the human eye is much better at such inferences). So there’s RDF, right there. Is that Linked Data? Is it Semantic Web? I’m not really sure.

More to the point, would it have been any greater use to me if it had been so encoded? A FOAF-hunting spider could traverse the web and build up a network of people, and I might be able to query that network, and even get the results downloaded in the form of a vcard that I could import into my Mac Address Book. That sounds quite possible, and the tools may already exist. Or, there may exist an app (what we used to call a Small Matter Of Programming, or a SMOP) that I could point at a web page with FOAF RDFa on it. Perhaps that’s what Paul was after in relation to scripting. Maybe the upcoming Dev8D might find this an interesting task to look at?

What other things could be done with such a page? Well, Paul or others might use it to disambiguate the many Paul Walk alter egos out there. You’ll see I have a simple link to Paul’s contact page above, but if this blog were RDF-enabled, perhaps we could have a more formal link to the assertions on the page, eg to that Paul Walk’s phone number, that Paul Walk’s email address, etc.

Well I’m not sure if this makes sense, and it does feel like one of those “first fax machine” situations. However FOAF has been around for a long while now. Does that mean that folk don’t perceive an advantage in such formal encodings to balance their costs, or is this an absence of value because of a lack of exploitable tools? If so, anyone going to Dev8D want to make an app for me?

(It’s also possible of course that Paul doesn’t want his details to be spidered up in this way, but I guess none of us should put contact details on the web if that’s our position.)

By the way, I found a web page called FOAF-a-matic that will create FOAF RDF for you. Here's an extract from what it created for me, in RDF:

<foaf:Person rdf:ID="me"> <foaf:name>Chris Rusbridge</foaf:name> <foaf:title>Mr</foaf:title> <foaf:givenname>Chris</foaf:givenname>
<foaf:family_name>Rusbridge</foaf:family_name>
<foaf:mbox rdf:resource="mailto:c.rusbridge@xxxxx"/> <foaf:workplaceHomepage rdf:resource="/"/>
</foaf:Person>
What could I do with that now?