Microformats and RDFa deployment across the Web

I have presented on previous occasions (at Semtech 2009, SemTech 2010, and later at FIA Ghent 2010, see slides for the latter, also in ISWC 2009) some information about microformat and RDFa deployment on the Web. As such information is hard to come by, this has generated some interest from the audience. Unfortunately, Q&A time after presentations is too short to get into details, hence some additional background on how we obtained this data and what it means for the Web. This level of detail is also important to compare this with information from other sources, where things might be measured differently.

The chart below shows the deployment of certain microformats and RDFa markup on the Web, as percentage of all web pages, based on an analysis of 12 billion web pages indexed by Yahoo! Search. The same analysis has been done at three different time-points and therefore the chart also shows the evolution of deployment.

Microformats and RDFa deployment on the Web (% of all web pages)

The data is given below in a tabular format.

Date RDFa eRDF tag hcard adr hatom xfn geo hreview
09-2008 0.238 0.093 N/A 1.649 N/A 0.476 0.363 N/A 0.051
03-2009 0.588 0.069 2.657 2.005 0.872 0.790 0.466 0.228 0.069
10-2010 3.591 0.000 2.289 1.058 0.237 1.177 0.339 0.137 0.159

There are a couple of comments to make:

  • There are many microformats (see microformats.org) and I only include data for the ones that are most common on the Web. To my knowledge at least, all other microformats are less common than the ones listed above.
  • eRDF has been a predecessor to RDFa, and has been obsoleted by it. RDFa is more fully featured than eRDF, and has been adopted as a standard by the W3C.
  • The data for the tag, adr and geo formats is missing from the first measurement.
  • The numbers cannot be aggregated to get a total percentage of URLs with metadata. The reason is that a webpage may contain multiple microformats and/or RDFa markup. In fact, this is almost always the case with the adr and geo microformats, which are typically used as part of hcard. The hcard microformat itself can be part of hatom markup etc.
  • Not all data is equally useful, depending on what you are trying to do. The tag microformat, for example, is nothing more than a set of keywords attached to a webpage. RDFa itself covers data using many different ontologies.
  • The data doesn’t include “trivial” RDFa usage, i.e. documents that only contain triples from the xhtml namespace. Such triples are often generated by RDFa parsers even when the page author did not intend to use RDFa.
  • This data includes all valid RDFa, and not just namespaces or vocabularies supported by Yahoo! or any other company.

The data shows that the usage of RDFa has increased 510% between March, 2009 and October, 2010, from 0.6% of webpages to 3.6% of webpages (or 430 million webpages in our sample of 12 billion). This is largely thanks to the efforts of the folks at Yahoo! (SearchMonkey), Google (Rich Snippets) and Facebook (Open Graph), all of whom recommend the usage of RDFa. The deployment of microformats has not advanced significantly in the same period, except for the hatom microformat.

These results make me optimistic that the Semantic Web is here already in large ways. I don’t expect that a 100% of webpages will ever adopt microformats or RDFa markup, simply because not all web pages contain structured data. As this seems interesting to watch, I will try to publish updates to the data and include the update chart here or in future presentations.

Enhanced by Zemanta
About these ads

28 comments so far

  1. [...] This post was mentioned on Twitter by Mike Linksvayer and Arto Bendiken, Steren Giannini. Steren Giannini said: RT @mlinksva: http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/ #rdfa #microformats #semweb #metrics [...]

  2. [...] You can learn more about this cutting edge research by Peter Mike at Yahoo! on his personal blog. [...]

  3. Doc on

    Very encouraging! I hope the process will accelerate even more.

    It’s also encouraging to see C-Tags coming into play at a good clip. I was beginning to wonder.

    Great post.

  4. Kevin Marks on

    RDFa without a shared vocabulary is just noise. How about histogramming the RDFa vocabularies found as you did with the microformats.

    • Brian Sletten on

      Kevin, while the value proposition certainly rises w/ the use of shared vocabularies, it is quite incorrect to consider it noise. The discoverability of relationships that can be used, unified, etc. is done in a standard way that does not require 100% custom code to leverage. It also facilitates co-reference resolution for humans asking questions of datasets.

  5. danbri on

    Yes, what Kevin said. Can you post more about the vocabulary breakdown? Or even data dumps per vocab and term?

  6. kaidez on

    How about microdata? The HTML5 format of RDF & microformats? Google and Bing are using it…it may not be the holy grail but it’s interesting

  7. [...] Microformats and RDFa deployment across the Web [...]

  8. Quora on

    Can RDF become an universal exchange language between applications ?…

    RDFa adoption is going well: as of October 2010, RDFa was used on 3.6% of all web pages (430 million pages in Yahoo! Research’s sample of 12 billion pages). This makes RDFa the most used data markup format on the Web, as well as the fastest growing on…

  9. [...] are over 430 million Web pages that use RDFa today, based just on Drupal 7′s release numbers (Drupal 7 includes RDFa by [...]

  10. [...] is worth something is hard data – such as there are at least 430+ million web pages containing RDFa and CURIEs today. That there are currently 23,913 RDFa-enabled Drupal 7 sites using CURIEs right now, which [...]

  11. Quora on

    What were the most poorly executed good ideas in the last 30 years of computing?…

    Looking at the state of linked data today; I would disagree with a lot of your comment. I agree it took some time to find the right balance; but I believe it has been found via linked data and RDFa; not to mention SPARQL. For instance, RDFa is undergoi…

  12. André Luís on

    Cool stuff.

    Is there any way of distinguishing between vocabularies?

    I mean, I’d love to see the contribution to OpenGraph in that rdfa bar. ;)

    It still pains me a bit to see the uf’s so low, though. Might mean SEO/Social is more important than UX.

  13. [...] new article by Peter Mika looks at the growing reach of RDFa and microformats on the web. The article includes a chart with information on the deployment of [...]

  14. [...] new article by Peter Mika looks at the growing reach of RDFa and microformats on the web. The article includes a chart with information on the deployment of [...]

  15. [...] Jahr 2010 ist der weltweite Anteil an Webseiten, die RDFa benutzten um 510% gestiegen (Quelle: Peter Mika). Grund genug, diesem Mikroformat einen Blogpost zu [...]

  16. [...] post on “Microformats and RDFa deployment across the Web” (see http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/) describes “the deployment of certain microformats and RDFa markup on the Web, as percentage [...]

  17. [...] post on Microformats and RDFa deployment across the Web recently surveyed take-up of RDFa based on an analysis of 12 billion web pages indexed by Yahoo! [...]

  18. [...] post on Microformats and RDFa deployment across the Web recently surveyed take-up of RDFa based on an analysis of 12 billion web pages indexed by Yahoo! [...]

  19. Selamican on

    Oldukça başarılı bir yazı olmuş.Eline sağlık!

  20. [...] – RDFa is the only one that has experienced triple digit growth in the last year – 510% growth over the last year, to be exact. There are no such figures for Microdata. If you are going to claim that something has [...]

  21. Chris Meloni on

    To Find Out More…

    Below is a link where you can find further info that we encourage you to read…

  22. [...] was encouraging to see this blog early this year about microformats and RDFa deployments. And if you look at what has happened [...]

  23. [...] findings deviate wildly from the findings by Yahoo around the same time. Additionally, the claim that 88.9% of all pages on the Web contain [...]

  24. [...] if the data correlates with the Yahoo! study or the Web Data Commons [...]

  25. [...] if the data correlates with the Yahoo! study or the Web Data Commons [...]

  26. [...] Ето ви едно, доста красноречиво, статистическо доказат…. Според тези данни RDFa има адски голям (над 500% ръст за последните 1-2 години) растеж. Забележете, че нито в тези данни, нито в други, до ден днешен, не става ясно доколко е разпространена технологията за microdata от Schema.org . [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: