Common Tag semantic tagging format released today

The Common Tag format for semantic tagging has been finally released today after almost a year of intense work on it by a group of Web companies active in the semantic technologies area, among them Yahoo. It’s been great fun working on this and I’m proud to have been involved: while there have been vocabularies before for representing tags in RDF, this effort is different in at least two respects.

First, a significant effort of time has been spent on making sure the specification meets the needs of all partners involved. The support of these companies for the specification will ensure that developers in the future can rely on a single format for annotation with semantic tags and interchanging tag data. The website already lists a number of applications but I’m pretty sure that a common tagging format will open entirely new possibilities in searching, navigating and aggregating web content.

Second, the format has been developed with publishers in mind, in particular in making it as easy as possible to embed semantic tags in HTML using RDFa, a syntax universally embraced by all those involved. The choice for RDF also means that unlike in the case of the rel-tag microformat, Common Tags can be applied to any object, not just documents.

So, it’s time for a new era in tagging!

Reblog this post [with Zemanta]

VoCamp Sunnyvale: June 18-19, 2009

VoCamps provide the missing social interaction needed for vocabulary creation and management on the Semantic Web: a space where members of the community can discuss the current issues related to vocabularies and semantic interoperability. Unlike Semantic Web meetups which typically take just a few hours and where the discussion focuses on a single presentation, VoCamps are two-day events that allow in-depth discussions and working in small groups.

Following the success of VoCamp Ibiza, we are organizing another similar event at Yahoo, but this time in the US, where VoCamps are now also taking hold. (VoCampDC will be organized at the end of May and has already reached it’s full capacity!) This VoCamp will take place in Sunnyvale, directly after the SemTech 2009 conference.

If you would like to join this next edition of the VoCamp series, please sign up on the VoCampSunnyvale2009 wiki page! The space is limited, but we will try to expand if necessary. Hope to see many of you in San Jose and Sunnyvale!

Upcoming events: VoCampIbiza, FoWS and SemSearch

It’s rare that I’m involved involved in organizing three events at the same time, especially that those events take place in a period of two weeks. Nevertheless, I’m equally excited about each of them for different reasons.

The first one will be VoCampIbiza. I have a great feeling about VoCamp,  since in the year we first discussed the idea with Tom Heath, VoCamps have grown into a movement, even jumping across the Atlantic. Feeling at least partly responsible, I’ve felt the need to organize at least one such event somewhere nearby. (VoCamps are organized at different times and places unlike regular, local Semantic Web meetups which are also on the rise, see check out the Semantic Web meetup alliance)  So it’s going to take happen on Ibiza during April 15-16, on the week before WWW, and if you want to come, you just have to sign up! There is no registration fee and Ibiza is cheap to reach from many places in Europe.

The second event, the Future of Web Seach Workshop is a regular yearly get-together organized by Ricardo (Baeza-Yates), but this year it will have a special focus on semantic search. The format is again fairly flexible, no papers, only interesting presentations. (The list of presentations is already fixed and posted on the Website, but the registration is still open if you want to join!) It will take place right after VoCamp, April 17-18, so that’s again the week before WWW.

Lastly, there will be the second Semantic Search Workshop taking place at WWW in Madrid on April 21.  Together with my co-organizers (Thanh Tran Duc from AIFB in Karlsruhe, Haofen Wang from the Apex Lab and Marko Grobelnik from JSI) we already had the feeling that WWW is probably the best place to take this workshop as the conference naturally brings together researchers from IR and the Semantic Web. The number and quality of papers, as well as the number of participants who have registered so far are certainly very promising indicators!

SearchMonkey simplified

I’m proud to say that today we have released ‘SearchMonkey Objects’, which might seem like a small step in the evolution of the Monkey (huh!) but we are hoping that it will radically simplify the ramp-up for site owners when it comes to enabling rich, structured results for their websites.

So what’s happening here? Well, until now if you wanted to have an enhanced result based on metadata, you needed to mark up your site, and then create an application to transform the metadata into a search result presentation.  These applications were simple (all they do is map fields in the data to parts of a presentation template) but it still required developers to write PHP code.

We realized that this could be simpler! In particular, from now on if you provide Yahoo structured data using vocabularies (formats) that we understand, we can create a rich result for you without you having to write a single line of code! Obviously, if you want to customize the presentation to your particular site, you can still do that by writing a presentation application, but if you are happy with the standard treatment, you don’t have to.

SearchMonkey Objects is a simple website that not only shows you how to mark up your page for certain types of objects, but it also let’s you validate immediately if your markup is correct.  Again, this is something that many have asked for in the past. The first objects that we support are Video, Games and Documents, but more are on the way.

I believe this is an exciting step because it will no doubt lead to a great adoption of RDFa and other forms of semantic markup, bringing us even closer to stucturing the Web. And as always, tell us what you think!

Yahoo makes the World’s metadata available through BOSS

I’m thrilled to spread the news further that we have just made available all public metadata (microformats, eRDF and RDFa data) that we crawl through the BOSS API. See the official announcement and commentaries throughout the Web.

This means essentially opening up all public data available to SearchMonkey applications, and thus making it possible for anyone to experiment with various forms of semantic search that go well beyond changing the way abstracts look. Consider for example the microsearch prototype I blogged about a few months ago, which showed rich abstracts, plus temporal and geographic visualizations based on metadata. You can now build something like microsearch in a matter of hours, and in a highly scalable fashion. Last, but not least, the terms of the BOSS API allows you to monetize your search engine in any way you want.

So if you think you can build a better search engine through semantics, this is a great time to start! All we ask is to give us feedback on what you do, minimally by tagging your experiment with the tag ‘bossmashup’.

Microformats in RDF

I’ve been  to VoCamp Galway 2008 last week, which has been my first VoCamp. I have to say it has been a fantastic experience. It seems like the VoCamp formula just works out of the box: you let in a bunch of vocabulary geeks inside a closed room and they immediately get down to fixing some of the key outstanding questions surrounding vocabularies on the Semantic Web (and in their spare time, they create some new vocabs).

The primary issue I set out to discuss was the state of RDF vocabularies for microformats. Why is this important? A growing number of tools exist that convert microformats into RDF but in case everyone chooses different representations it will be significantly more difficult to interoperate. So in a small group we set out to document what we believe are the best practice mappings from microformats to RDF.

We started by collecting the existing RDF vocabularies for stable microformats and documented them on the SemanticWeb.org wiki. Alone this has been an incredibly useful exercise since we uncovered some pretty immediate targets for fixing:

  • hCard. The fact there are two VCard ontologies lingering on W3C’s website [1,2]. It’s clear to the trained eye that one is deprecating the other, but unfortunately it doesn’t say so anywhere… I’ve written an email about this to the friendly folks at W3C.
  • hCalendar…same here. it seems the RDF Calendar group couldn’t agree at some point on the design, forking the ontology into two… We now took this discussion to the RDF Calendar mailing list, with the intention to consolidate things.
  • rel-license… we have three terms for saying ‘this is the license for this doc’: xhtml:license (which is the default namespace in RDFa), dc:license and cc:license.
  • hAtom… Again, there seem to be multiple chooses…we only got as far to list the ones we knew about.
  • hResume… In this case we haven’t found a direct mapping: there are various resume ontologies but they all do more or something slightly different than hResume.

This is about as far as we got, but I’ve seen that in the past week already people have added some pretty useful info to the Wiki at semanticweb.org. If you have something to contribute as well, please do so!

Social Networks and the Semantic Web

There seems to be again a growing interest in the topic of social networks and the Semantic Web.

Besides some publications here and there, I have recently attended the highly motivating seminar on  Social Web Communities, a full week get-together of computer scientists with a social inclination, in the middle-of-nowhere conference paradise that is Dagstuhl. Most of the week was spent in small groups and in the group I joined we had some fun formalizing the notion of ’surprise’, i.e. an unexpected connection between two people (or things) in a network.

Another interesting development is the plan hatched by Dan Brickley and Harry Halpin to launch a Social Web Incubator group within the W3C.  Hopefully this will take on the issue of representation, since other initiatives with social and open in their names tended to focus on APIs and the technical interoperability of applications. To kickstart all this, the W3C is organizing a Workshop on the Future of Social Networking in January, in my (current) hometown of Barcelona!

Lastly, all of this reminded me that I haven’t yet blogged about my book, Social Networks and the Semantic Web. The book – pictured here on the right- unfortunately costs more than it really should due to the low number of copies, but it is fair to say that is a good summary of the technologies around which this field revolves, including social network mining from the Web, social network representations using semantic technologies, social network analysis and visualization, and proving social theories using social network data. The book is based on my PhD thesis, and although it has been out for a year, it probably still makes an interesting read for anyone who wants to learn about this field.

What’s wrong with vocabularies on the Semantic Web?

We are only two days away from the first VoCamp. Although I can not make it — I’m attending the Dagstuhl seminar on the Social Web –, I’m incredibly excited that we in the Semantic Web community are finally starting to talk about the situation around vocabularies. For this reason, I decided to share some thoughts on the topic, as an input to the discussion. My personal opinion, but based on what I see around me at work.

So what’s wrong with vocabularies on the Semantic Web? Well, anno 2008 we can finally state with confidence that the approach we followed so far (of having no particular approach) did not produce the results we expected. Here is a list of things that would need urgent fixing, in no particular order:

  1. Centralize. Unlike with microformats, there is no single point on the Semantic Web to go for vocabularies, to discuss them, to share use cases and examples. Decentralization left us with no way to tell what is out there and how trustworthy or useful it is. The lack of options leads to abuse. If a user doesn’t find the movie vocabulary, he will still create a movie class in the rdfs namespace. And yes, there is plenty of rdfs:movie out there.
  2. Release your vocabularies. There are many talented people in our community who have created the first vocabulary for topic X only to abandon it a few month later when moving on to the next project. I have nothing against personomies but plan for succession in case the time comes that you cannot maintain it any more. If you’ve done your job well, there will be plenty of people willing to carry on your work. However, we cannot maintain your vocabulary for you without your cooperation. I like PURLs.
  3. Embrace microformats. Microformats are just another way of providing metadata. We need good RDF/OWL vocabularies for at least the most widely used microformats. These should be straightforward representations with a one-to-one mapping between elements in the microformat and in the vocabulary, and not a grand redesign. I know it’s tempting to redesign a microformat but it’s important to resist that temptation.
  4. Design vocabularies for annotations. Now that RDFa is out, most of us are facing some mild inconvenience of using existing ontologies for marking up web pages. Existing ontologies will need some adaptation because they were designed from a consumer perspective, not from a producer perspective. This will likely mean that we will have to compromise, on at least some occasions. For example, in social sites it’s common to expose only the user’s age and number of friends (for which FOAF has no properties), not the birthday and the list of friends (for which it does). Principles of ontology engineering may dictate that we shouldn’t introduce time-bound properties or properties for counts of things, or multiple properties for saying the same things in different ways. But we will just have to do it, or else it will be rdfs:age.
  5. Clarify the role of W3C in the process. As people find vocabularies by searching, they often run into ontologies that appear in the form of W3C member submissions. These are easily findable (good pagerank) and the formatting and placement gives them an official disguise. Problem is, anyone (who is a member, that is…) can author such a submission, which as evidence shows will not be updated nor removed over time. This is the way we got to have two ontologies for VCard with documentation and URIs in the W3C domain (1, 2). In my personal opinion this is doing more harm than good.

This is certainly not the complete list, but I hope that at least some of these points will come up in Oxford. And I’m sorry guys I can not make it. Have fun!

Submissions open for the Semantic Web Challenge 2008

Submissions are now open for the Semantic Web Challenge 2008! As already announced, this year we are inviting submissions for both the Open Track (the ‘classic’ Challenge) and the Billion Triples Challenge, a newly created track with the focus on scalability. For more details, see the call for the Challenge.

Submit your application through the online submission system by October 1, 2008.

SearchMonkey goes to the iPhone

A little bit of Semantic Web in your pocket… read the unofficial-official announcement and admire the screenshots!

Next Page »