I’ve been to VoCamp Galway 2008 last week, which has been my first VoCamp. I have to say it has been a fantastic experience. It seems like the VoCamp formula just works out of the box: you let in a bunch of vocabulary geeks inside a closed room and they immediately get down to fixing some of the key outstanding questions surrounding vocabularies on the Semantic Web (and in their spare time, they create some new vocabs).
The primary issue I set out to discuss was the state of RDF vocabularies for microformats. Why is this important? A growing number of tools exist that convert microformats into RDF but in case everyone chooses different representations it will be significantly more difficult to interoperate. So in a small group we set out to document what we believe are the best practice mappings from microformats to RDF.
We started by collecting the existing RDF vocabularies for stable microformats and documented them on the SemanticWeb.org wiki. Alone this has been an incredibly useful exercise since we uncovered some pretty immediate targets for fixing:
- hCard. The fact there are two VCard ontologies lingering on W3C’s website [1,2]. It’s clear to the trained eye that one is deprecating the other, but unfortunately it doesn’t say so anywhere… I’ve written an email about this to the friendly folks at W3C.
- hCalendar…same here. it seems the RDF Calendar group couldn’t agree at some point on the design, forking the ontology into two… We now took this discussion to the RDF Calendar mailing list, with the intention to consolidate things.
- rel-license… we have three terms for saying ‘this is the license for this doc’: xhtml:license (which is the default namespace in RDFa), dc:license and cc:license.
- hAtom… Again, there seem to be multiple chooses…we only got as far to list the ones we knew about.
- hResume… In this case we haven’t found a direct mapping: there are various resume ontologies but they all do more or something slightly different than hResume.
This is about as far as we got, but I’ve seen that in the past week already people have added some pretty useful info to the Wiki at semanticweb.org. If you have something to contribute as well, please do so!
We are only two days away from the first VoCamp. Although I can not make it — I’m attending the Dagstuhl seminar on the Social Web –, I’m incredibly excited that we in the Semantic Web community are finally starting to talk about the situation around vocabularies. For this reason, I decided to share some thoughts on the topic, as an input to the discussion. My personal opinion, but based on what I see around me at work.
So what’s wrong with vocabularies on the Semantic Web? Well, anno 2008 we can finally state with confidence that the approach we followed so far (of having no particular approach) did not produce the results we expected. Here is a list of things that would need urgent fixing, in no particular order:
- Centralize. Unlike with microformats, there is no single point on the Semantic Web to go for vocabularies, to discuss them, to share use cases and examples. Decentralization left us with no way to tell what is out there and how trustworthy or useful it is. The lack of options leads to abuse. If a user doesn’t find the movie vocabulary, he will still create a movie class in the rdfs namespace. And yes, there is plenty of rdfs:movie out there.
- Release your vocabularies. There are many talented people in our community who have created the first vocabulary for topic X only to abandon it a few month later when moving on to the next project. I have nothing against personomies but plan for succession in case the time comes that you cannot maintain it any more. If you’ve done your job well, there will be plenty of people willing to carry on your work. However, we cannot maintain your vocabulary for you without your cooperation. I like PURLs.
- Embrace microformats. Microformats are just another way of providing metadata. We need good RDF/OWL vocabularies for at least the most widely used microformats. These should be straightforward representations with a one-to-one mapping between elements in the microformat and in the vocabulary, and not a grand redesign. I know it’s tempting to redesign a microformat but it’s important to resist that temptation.
- Design vocabularies for annotations. Now that RDFa is out, most of us are facing some mild inconvenience of using existing ontologies for marking up web pages. Existing ontologies will need some adaptation because they were designed from a consumer perspective, not from a producer perspective. This will likely mean that we will have to compromise, on at least some occasions. For example, in social sites it’s common to expose only the user’s age and number of friends (for which FOAF has no properties), not the birthday and the list of friends (for which it does). Principles of ontology engineering may dictate that we shouldn’t introduce time-bound properties or properties for counts of things, or multiple properties for saying the same things in different ways. But we will just have to do it, or else it will be rdfs:age.
- Clarify the role of W3C in the process. As people find vocabularies by searching, they often run into ontologies that appear in the form of W3C member submissions. These are easily findable (good pagerank) and the formatting and placement gives them an official disguise. Problem is, anyone (who is a member, that is…) can author such a submission, which as evidence shows will not be updated nor removed over time. This is the way we got to have two ontologies for VCard with documentation and URIs in the W3C domain (1, 2). In my personal opinion this is doing more harm than good.
This is certainly not the complete list, but I hope that at least some of these points will come up in Oxford. And I’m sorry guys I can not make it. Have fun!
Submissions are now open for the Semantic Web Challenge 2008! As already announced, this year we are inviting submissions for both the Open Track (the ‘classic’ Challenge) and the Billion Triples Challenge, a newly created track with the focus on scalability. For more details, see the call for the Challenge.
Submit your application through the online submission system by October 1, 2008.
I begin with some argumentation as to why we need semantics in search. I talk about what I see as an interesting complementarity between recent advances in automated semantic tagging of natural text and other approaches to bringing metadata to the Web (such as exposing databases and APIs as RDF). Lastly, I went into a bit of detail as to how Yahoo’s SearchMonkey has been benefiting from Semantic Web technology in what we’ve been doing up to now, and the ways in which the widening adoption of semantic technology is driving us towards simplification.
p.s. Also on DevX: an introduction to OpenCalais and SearchMonkey by James Leigh.
If you are arriving to SemTech early this year or you happen to live in the Bay Area, come by the Yahoo! campus for the SearchMonkey launch party on May 15 (Thursday): you will have a chance to learn about the Monkey, talk to developers and product managers, and enjoy some free food. Be there or be a… banana.
I’ll be there in my semi-official position as Data Architect for SearchMonkey, and you will also be able to catch me at SemTech, where I’ll be both presenting under the title Making the Web Searchable and participating in a panel on Giving Web Search a Face-lift.
Looking forward to seeing many of my friends there!
I wanted to share with you the excitement of Yahoo’s announcement today of our support for the Semantic Web effort through the SearchMonkey project. More details in this Yahoo! Search blog article and in reports at TechCrunch, ReadWriteWeb and SearchEngineLand.
Exciting times for me personally, for Yahoo! and for the Semantic Web at large!
David Karger commented on my blog, suggesting a combination of microsearch and Simile Exhibit. A great idea, I’ll follow up on that in a minute. I’m sure many of you have ideas of your own for other mashups that require searching for metadata. For this reason I just wanted to bring attention to the little logo right after the text “Search Results”, because it does an important piece of magic: it exposes all the metadata that has been gathered from the result pages in RDF/XML. (The result pages include those displayed plus more.) This is the key to start building on microsearch.
Granted, what you build might be slow. But by all means, please get in touch… we will find a way to make things work faster for you