Welcome to schema.org

Bing, Google, and Yahoo! have announced schema.org yesterday, a collaboration between the three search providers in the area of vocabularies for structured data. As the ‘schema guy’ at Yahoo!, I have been part of the very small core team that developed technical content for schema.org. It’s been an interesting process: if you doubt that achieving an agreement in the search domain is hard, consider that the last time such an agreement happened was apparently sitemaps.org in 2006.

However, over the years, the lack of agreement on schemas have become such a major pain point for publishers new to the Semantic Web, that eventually cooperation became the only sensible thing to do for the future of the Semantic Web project. Consider that until yesterday any publisher that wanted to provide structured data for Bing, Google, and Yahoo needed to navigate three sets of documentation, and worse, choose between three different schemas and multiple formats (microdata, RDFa, microformats) and markup their pages using all of them.

So how did we get here? In my personal view, one of the key problems of the Semantic Web design of the W3C has been that it considered only technical issues, and not the need for a social process that would lead to bootstrap the system with data and schemas. We are now doing better on the data front thanks to large community efforts such as Linked Data. With regard to schemas — we used to call them ontologies, until we found it scared people away — the expectation was that they would be developed in a distributed manner and machines would do the hard job of schema matching or somehow agreements would emerge. However, schema matching is a hard problem to automate. Agreements were slow to come due to a lack of space for schema development and discussions. We have tried a number of things in this respect, for example some of you might know that I’ve been one of the instigators of the VoCamp movement which peaked around 2009. The W3C itself accepted some RDF-based schemas as member submissions, but it didn’t see itself as the organization that should deal with schemas, and there has been no process for dealing with these submissions either. (As an example, we learned when we started with SearchMonkey that there have been actually two versions of VCard in RDF submitted by two different members of the W3C. This problem has since been resolved.) Other schemas just appeared on websites abandoned by their owners. Finding stable and mature schemas with sufficient adoption has eventually become a major pain point. In the search domain, the situation improved somewhat when search providers preselected some schemas for publishers to use, and started providing specific documentation, with examples and a way to validate webpages. However, as illustrated above, the efforts have been still too fragmented until yesterday.

Given the above history, I’m extremely glad that cooperation prevailed in the end and hopefully schema.org will become a central point for vocabularies for the Semantic Web for a long time to come. Note that it will almost certainly not be the only one. schema.org covers the core interests of search providers, i.e. the stuff that people search for the most (hence the somewhat awkward term ‘search vocabularies’). As the simple needs are the most common in search logs, this includes things like addresses of businesses, reviews and recipes. schema.org will hopefully evolve with extensions over time but it may never cover complex domains such as biotechnology, e-government or others where people have been using Semantic Web technology with success. Nor do I think that schema.org is ‘perfect’. Personally, I would have liked to see RDFa used as the syntax for the basic examples, because I consider it more mature, and a superior standard to microdata in many ways. You will notice that RDF(a) in particular would have offered a standard way to extend schema.org schemas and map them to other schemas on the Web. Currently, there is an example of using the schemas in RDFa, but the support for this version of the markup will depend on its adoption.

Please take a look at schema.org, and if you have comments please consider using the schema.org feedback mechanisms (we have a feedback form as well as a discussion group).

Enhanced by Zemanta

13 comments so far

  1. Tom Heath on

    Hi Peter,

    I share your mild regrets re use of microdata rather than RDFa (and it would be great to see more worked examples here), but overall congratulations on a significant piece of collaboration. Technicalities aside, the key thing here is a clear way for people to publish more structured data on their sites. Let’s think of it as a gateway drug for the Semantic Web, assuming the highs of this schema wear off at some point 😉

    Two other points:

    – Were Facebook involved at all, wrt OpenGraphProtocol? Would seem a logical step to try and integrate the two sets of types 9and other vocabulary terms?) between schema.org and OGP. Also, are there plans to harmonise the list of Place types here with e.g. those in the Google Places API? We don’t want to build a single mega-vocabulary (that has never been the plan), but there do seem to be other useful (and nearby) points for collaboration.

    – Re VoCamp, think of 2009 as an early surge; too soon to call that the peak 😉

    Cheers,

    Tom.

  2. tripletalk on

    Hi Tom,

    Just like you, I do hope that Facebook will eventually join this effort, even though they haven’t been there from the start. I would make the suggestion regarding Places and other extensions on the schema.org list…

    And yes, I also share your hope that 2009 was just a local maximum for VoCamp 😉

    Cheers,
    Peter

  3. Microdata and RDFa on

    […] Peter Mika is still a RDFa fan, but also has a pragmatic appreciation for the agreement of the big three search companies on a standard for semantic data. “Given […]

  4. mhausenblas on

    Peter,

    I think that with this step the ‘big three’ have acknowledged the importance of structured data on the Web and have eventually endorsed it.

    We are now working on a canonical mapping to RDF [1] and you’re of course welcome to join in 😉

    Cheers,

    Michael

    [1] http://schema.rdfs.org/

  5. Brian Peterson on

    The choice for exclusive support for Microdata is a terrible one for the Web. Claiming that developers can’t handle 3 different formats is silly. Not including support for GoodRelations is just bizarre. A shut-out of RDFa and Microformats is clearly driven by personal agendas. RDFa and Microformats are much more mature than Microdata. Manu gives a heart-felt plea to reverse this decision (http://ow.ly/59DBK), and also debunks the claims made against RDFa. There is a trivial mapping from Microdata to RDFa. The least schema.org could have done is support a simple subset of RDFa that covers Microdata, which leaves a clear path for more extensive use cases. Supporting a simple subset of RDFa is exactly what Facebook did for OGP. People expect this kind of shut-out maneuver from Microsoft, but the Web expects better from Google and Yahoo!. I’m really quite surprised by this.

  6. Doc on

    Nice post! Like you, I would have greatly preferred to have seen RDFa selected as the backbone of schema.org. In fact, I’d have rejoiced! I really can’t see the logic in choosing microdata… it just doesn’t bring nearly as much to the table, IMO.
    Still, I can appreciate that an agreement between the parties is monumental. I’m just afraid that this may set implementation back several years.

  7. paulbruemmer on

    Great work Peter! That’s fantastic, good team effort. I’m certain we’ll be seeing the positive effects moving forward.

  8. Giovanni Tummarello on

    I understand the pragmatics and it makes sense. What i don’t like is the secrecy surrounding this and coming out at once. What would have been the drawbacks of launching it as an alpha, allowing discussions and contributions and other people to join (e.g. facebook) and then going gold just after a while?

    Given nobody was going to steal the idea (altavista?), I really cant see these disadvantages (and as far as i know many others dont either) so i feel the achievement carries a certain feeling of arrogance.

    Anyway 🙂 this said i do think this is indeed good for the web – and again do understand the move.. afterall the semantic web community had 11 years or so to come up with a reasonable way to discuss and validate vocabularies and failed.

    For what it matters.. we’ll have full support to it in Sindice in a week or so and this will create new markets altogether likely.

    • paulbruemmer on

      Hi Giovanni, I agree with your comments re: secrecy yet I’m very encouraged to hear you will be offering full support at Sindice within a few weeks, that’s awesome! The new markets are very exciting, can’t wait to dig in.

  9. […] at schema.rdfs.org. That is the way the well payed engineers of the three big companies should have done it in the first place. Now we can link it to DBPedia and other resources, extend it for our specific domains and use it […]

  10. […] Mika, uno degli autori, presso Yahoo, di questo standard, spiega come si sia giunti a questo formalismo a scapito dello standard emergente RDFa che, benchè formalizzasse in maniera chiara come […]

  11. […] June 3 – Peter Mika – Welcome to schema.org […]

  12. […] message is: webmasters out there, adopt! For you who follow the blogpost by Google,Yahoo and Microsoft, you can start adopting! But hang on, haven’t some of us just installed Drupal […]


Leave a comment