What’s wrong with vocabularies on the Semantic Web?

We are only two days away from the first VoCamp. Although I can not make it — I’m attending the Dagstuhl seminar on the Social Web –, I’m incredibly excited that we in the Semantic Web community are finally starting to talk about the situation around vocabularies. For this reason, I decided to share some thoughts on the topic, as an input to the discussion. My personal opinion, but based on what I see around me at work.

So what’s wrong with vocabularies on the Semantic Web? Well, anno 2008 we can finally state with confidence that the approach we followed so far (of having no particular approach) did not produce the results we expected. Here is a list of things that would need urgent fixing, in no particular order:

  1. Centralize. Unlike with microformats, there is no single point on the Semantic Web to go for vocabularies, to discuss them, to share use cases and examples. Decentralization left us with no way to tell what is out there and how trustworthy or useful it is. The lack of options leads to abuse. If a user doesn’t find the movie vocabulary, he will still create a movie class in the rdfs namespace. And yes, there is plenty of rdfs:movie out there.
  2. Release your vocabularies. There are many talented people in our community who have created the first vocabulary for topic X only to abandon it a few month later when moving on to the next project. I have nothing against personomies but plan for succession in case the time comes that you cannot maintain it any more. If you’ve done your job well, there will be plenty of people willing to carry on your work. However, we cannot maintain your vocabulary for you without your cooperation. I like PURLs.
  3. Embrace microformats. Microformats are just another way of providing metadata. We need good RDF/OWL vocabularies for at least the most widely used microformats. These should be straightforward representations with a one-to-one mapping between elements in the microformat and in the vocabulary, and not a grand redesign. I know it’s tempting to redesign a microformat but it’s important to resist that temptation.
  4. Design vocabularies for annotations. Now that RDFa is out, most of us are facing some mild inconvenience of using existing ontologies for marking up web pages. Existing ontologies will need some adaptation because they were designed from a consumer perspective, not from a producer perspective. This will likely mean that we will have to compromise, on at least some occasions. For example, in social sites it’s common to expose only the user’s age and number of friends (for which FOAF has no properties), not the birthday and the list of friends (for which it does). Principles of ontology engineering may dictate that we shouldn’t introduce time-bound properties or properties for counts of things, or multiple properties for saying the same things in different ways. But we will just have to do it, or else it will be rdfs:age.
  5. Clarify the role of W3C in the process. As people find vocabularies by searching, they often run into ontologies that appear in the form of W3C member submissions. These are easily findable (good pagerank) and the formatting and placement gives them an official disguise. Problem is, anyone (who is a member, that is…) can author such a submission, which as evidence shows will not be updated nor removed over time. This is the way we got to have two ontologies for VCard with documentation and URIs in the W3C domain (1, 2). In my personal opinion this is doing more harm than good.

This is certainly not the complete list, but I hope that at least some of these points will come up in Oxford. And I’m sorry guys I can not make it. Have fun!


8 comments so far

  1. Simon Gibbs on

    I deployed my first RDFa (<a href=”“>example) shortly before Linked Data Planet, only to be told – in TBLs keynote no less – that there was no standard vocabulary for the abstract class I had just published.

    That was rather irritating.

    Please, please somebody take charge of this stuff.

    (I cannot locate a specific quote regarding VCal to justify this version of events, but TBL does clearly call for standardisation of this sort. I may have interpolated somewhat, but I was wired on caffeine at the time and prey forgiveness)

  2. drewp on

    I think #1 (centralize) is confusing a few points. There’s certainly a lack of options and awareness, but I don’t see how centralizing helps. What does help are search engines like sindice, but there can be any number of those search engines (fortunately), so they’re not really “central”.

    Everything else in your paragraph, I agree with. There can be places to discuss vocabs; any number of trust systems (which is important, since I haven’t seen even one good one yet); and we should obviously be discouraging rdfs:movie. But there’s nothing wrong with 10 organizations competing to solve those problems. That’s what’s so great about the rest of the internet.

  3. […] in Oxford, England, with interest and discussion active about potentially many others to follow.  Peter Mika, Matthias Samwald and Tom Heath have each outlined their desires for this […]

  4. […] to build and work on vocabularies and ontologies for the Semantic Web. Peter Mika had a nice blog post recently on why such activity is badly […]

  5. […] around to describe the wealth of data in the world. Left to their own devices people will simply create ad-hoc vocabularies which do little to aid data sharing. It’s for these reasons that we need VoCamps, where […]

  6. Daniel on

    Can you get rid of the hard-to-read gray text and put it back to plain-old black?

  7. […] What’s wrong with vocabularies on the Semantic Web? (tripletalk.wordpress.com) […]

  8. Martin Hepp on

    Hi Peter,
    Thanks for the important remarks (and for including GoodRelations in the list of the nine Semantic Web vocabularies recommended by Yahoo ;-).

    Having spent almost a decade on building conceptual schemas and ontologies, I would like to add the following:

    1. Proper Documentation
    90% of the effort for establishing an ontology is needed for preparing a good and consistent documentation, webcasts, tutorials, etc.

    2. Software Support for the Whole Food Chain
    Make it as easy as possible to create and to consume data compliant to your ontology.

    3. Simplicity vs. Extensibility
    Yes, ontologies must be markup-friendly, and the simple cases should be simple. But at the same time, it is important that the whole conceptual design is clean enough to allow future extensions and unforeseen contexts of usage for your data.

    “Too simple” ontologies are as useless in the long run as “too complicated” ones.

    4. Plan for Incremental Granularity
    A key feature of GoodRelations is that you can represent the same things in an arbitrary degree of formality, depending on what degree of structure you have at the producer’s side. If all that you can provide is a text about a product and a price, attach all the product semantics via rdfs:comment and the price using GoodRelations. But if you have more granular data, or if others on the Web can add more granular data, your ontology should allow that.

    Best wishes

    Martin, http://www.heppnetz.de/

    More info on the GoodRelations ontology for e-commerce:

    * Project Main Page: http://purl.org/goodrelations/
    * Vocabulary: http://purl.org/goodrelations/v1
    * Developer’s Wiki: http://www.ebusiness-unibw.org/wiki/GoodRelations

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: