What’s wrong with vocabularies on the Semantic Web?
We are only two days away from the first VoCamp. Although I can not make it — I’m attending the Dagstuhl seminar on the Social Web –, I’m incredibly excited that we in the Semantic Web community are finally starting to talk about the situation around vocabularies. For this reason, I decided to share some thoughts on the topic, as an input to the discussion. My personal opinion, but based on what I see around me at work.
So what’s wrong with vocabularies on the Semantic Web? Well, anno 2008 we can finally state with confidence that the approach we followed so far (of having no particular approach) did not produce the results we expected. Here is a list of things that would need urgent fixing, in no particular order:
- Centralize. Unlike with microformats, there is no single point on the Semantic Web to go for vocabularies, to discuss them, to share use cases and examples. Decentralization left us with no way to tell what is out there and how trustworthy or useful it is. The lack of options leads to abuse. If a user doesn’t find the movie vocabulary, he will still create a movie class in the rdfs namespace. And yes, there is plenty of rdfs:movie out there.
- Release your vocabularies. There are many talented people in our community who have created the first vocabulary for topic X only to abandon it a few month later when moving on to the next project. I have nothing against personomies but plan for succession in case the time comes that you cannot maintain it any more. If you’ve done your job well, there will be plenty of people willing to carry on your work. However, we cannot maintain your vocabulary for you without your cooperation. I like PURLs.
- Embrace microformats. Microformats are just another way of providing metadata. We need good RDF/OWL vocabularies for at least the most widely used microformats. These should be straightforward representations with a one-to-one mapping between elements in the microformat and in the vocabulary, and not a grand redesign. I know it’s tempting to redesign a microformat but it’s important to resist that temptation.
- Design vocabularies for annotations. Now that RDFa is out, most of us are facing some mild inconvenience of using existing ontologies for marking up web pages. Existing ontologies will need some adaptation because they were designed from a consumer perspective, not from a producer perspective. This will likely mean that we will have to compromise, on at least some occasions. For example, in social sites it’s common to expose only the user’s age and number of friends (for which FOAF has no properties), not the birthday and the list of friends (for which it does). Principles of ontology engineering may dictate that we shouldn’t introduce time-bound properties or properties for counts of things, or multiple properties for saying the same things in different ways. But we will just have to do it, or else it will be rdfs:age.
- Clarify the role of W3C in the process. As people find vocabularies by searching, they often run into ontologies that appear in the form of W3C member submissions. These are easily findable (good pagerank) and the formatting and placement gives them an official disguise. Problem is, anyone (who is a member, that is…) can author such a submission, which as evidence shows will not be updated nor removed over time. This is the way we got to have two ontologies for VCard with documentation and URIs in the W3C domain (1, 2). In my personal opinion this is doing more harm than good.
This is certainly not the complete list, but I hope that at least some of these points will come up in Oxford. And I’m sorry guys I can not make it. Have fun!