Archive for the ‘Uncategorized’ Category

Welcome the new Yahoo homepage!

And in case you still doubted whether this is a company ran by nerds, watch this video.

Common Tag semantic tagging format released today

The Common Tag format for semantic tagging has been finally released today after almost a year of intense work on it by a group of Web companies active in the semantic technologies area, among them Yahoo. It’s been great fun working on this and I’m proud to have been involved: while there have been vocabularies before for representing tags in RDF, this effort is different in at least two respects.

First, a significant effort of time has been spent on making sure the specification meets the needs of all partners involved. The support of these companies for the specification will ensure that developers in the future can rely on a single format for annotation with semantic tags and interchanging tag data. The website already lists a number of applications but I’m pretty sure that a common tagging format will open entirely new possibilities in searching, navigating and aggregating web content.

Second, the format has been developed with publishers in mind, in particular in making it as easy as possible to embed semantic tags in HTML using RDFa, a syntax universally embraced by all those involved. The choice for RDF also means that unlike in the case of the rel-tag microformat, Common Tags can be applied to any object, not just documents.

So, it’s time for a new era in tagging!

Reblog this post [with Zemanta]

Upcoming events: VoCampIbiza, FoWS and SemSearch

It’s rare that I’m involved involved in organizing three events at the same time, especially that those events take place in a period of two weeks. Nevertheless, I’m equally excited about each of them for different reasons.

The first one will be VoCampIbiza. I have a great feeling about VoCamp,  since in the year we first discussed the idea with Tom Heath, VoCamps have grown into a movement, even jumping across the Atlantic. Feeling at least partly responsible, I’ve felt the need to organize at least one such event somewhere nearby. (VoCamps are organized at different times and places unlike regular, local Semantic Web meetups which are also on the rise, see check out the Semantic Web meetup alliance)  So it’s going to take happen on Ibiza during April 15-16, on the week before WWW, and if you want to come, you just have to sign up! There is no registration fee and Ibiza is cheap to reach from many places in Europe.

The second event, the Future of Web Seach Workshop is a regular yearly get-together organized by Ricardo (Baeza-Yates), but this year it will have a special focus on semantic search. The format is again fairly flexible, no papers, only interesting presentations. (The list of presentations is already fixed and posted on the Website, but the registration is still open if you want to join!) It will take place right after VoCamp, April 17-18, so that’s again the week before WWW.

Lastly, there will be the second Semantic Search Workshop taking place at WWW in Madrid on April 21.  Together with my co-organizers (Thanh Tran Duc from AIFB in Karlsruhe, Haofen Wang from the Apex Lab and Marko Grobelnik from JSI) we already had the feeling that WWW is probably the best place to take this workshop as the conference naturally brings together researchers from IR and the Semantic Web. The number and quality of papers, as well as the number of participants who have registered so far are certainly very promising indicators!

SearchMonkey simplified

I’m proud to say that today we have released ‘SearchMonkey Objects’, which might seem like a small step in the evolution of the Monkey (huh!) but we are hoping that it will radically simplify the ramp-up for site owners when it comes to enabling rich, structured results for their websites.

So what’s happening here? Well, until now if you wanted to have an enhanced result based on metadata, you needed to mark up your site, and then create an application to transform the metadata into a search result presentation.  These applications were simple (all they do is map fields in the data to parts of a presentation template) but it still required developers to write PHP code.

We realized that this could be simpler! In particular, from now on if you provide Yahoo structured data using vocabularies (formats) that we understand, we can create a rich result for you without you having to write a single line of code! Obviously, if you want to customize the presentation to your particular site, you can still do that by writing a presentation application, but if you are happy with the standard treatment, you don’t have to.

SearchMonkey Objects is a simple website that not only shows you how to mark up your page for certain types of objects, but it also let’s you validate immediately if your markup is correct.  Again, this is something that many have asked for in the past. The first objects that we support are Video, Games and Documents, but more are on the way.

I believe this is an exciting step because it will no doubt lead to a great adoption of RDFa and other forms of semantic markup, bringing us even closer to stucturing the Web. And as always, tell us what you think!

Microformats in RDF

I’ve been  to VoCamp Galway 2008 last week, which has been my first VoCamp. I have to say it has been a fantastic experience. It seems like the VoCamp formula just works out of the box: you let in a bunch of vocabulary geeks inside a closed room and they immediately get down to fixing some of the key outstanding questions surrounding vocabularies on the Semantic Web (and in their spare time, they create some new vocabs).

The primary issue I set out to discuss was the state of RDF vocabularies for microformats. Why is this important? A growing number of tools exist that convert microformats into RDF but in case everyone chooses different representations it will be significantly more difficult to interoperate. So in a small group we set out to document what we believe are the best practice mappings from microformats to RDF.

We started by collecting the existing RDF vocabularies for stable microformats and documented them on the SemanticWeb.org wiki. Alone this has been an incredibly useful exercise since we uncovered some pretty immediate targets for fixing:

  • hCard. The fact there are two VCard ontologies lingering on W3C’s website [1,2]. It’s clear to the trained eye that one is deprecating the other, but unfortunately it doesn’t say so anywhere… I’ve written an email about this to the friendly folks at W3C.
  • hCalendar…same here. it seems the RDF Calendar group couldn’t agree at some point on the design, forking the ontology into two… We now took this discussion to the RDF Calendar mailing list, with the intention to consolidate things.
  • rel-license… we have three terms for saying ‘this is the license for this doc’: xhtml:license (which is the default namespace in RDFa), dc:license and cc:license.
  • hAtom… Again, there seem to be multiple chooses…we only got as far to list the ones we knew about.
  • hResume… In this case we haven’t found a direct mapping: there are various resume ontologies but they all do more or something slightly different than hResume.

This is about as far as we got, but I’ve seen that in the past week already people have added some pretty useful info to the Wiki at semanticweb.org. If you have something to contribute as well, please do so!

Billion Triples Challenge Intro

At the beginning of December, Jim and I have sent the following email to SW related mailing lists around the World, introducing the idea of this Challenge. The reaction we got was immense: we had close to 50 people subscribing to the mailing list within a 24 hour time-frame, with currently more than 90 people tuned in. If you are interested, please join as well.
This is the first public pre-announcement of the Open Web, Billion Triples Challenge, which will be organized as a special track of next year’s Semantic Web Challenge. This track will be in addition to the traditional SWC competition and it will focus on pushing the limits in tool design on the fronts of scalability in size and robustness in the face of data typically found on the Web. The goal of the competition is also to generate new application ideas, i.e. to show what is possible with Web metadata today.
The details of this Challenge are yet to be determined and we are calling on the Semantic Web community to help us in its formation. For this purpose we have set up a mailing list [2] and would like to invite everyone interested in this new Challenge to join this list. The mailing list will serve to discuss the data sets and rules of competition, and later to disseminate all other information regarding the new Challenge.
Best Regards,
The co-chairs of the SWC:
Jim Hendler (Rensselaer Polytechnic Institute)
Peter Mika (Yahoo! Research)

A blog

I’ve started a blog to communicate my thoughts and ideas about the Semantic Web and discuss current events such as the Billion Triples Challenge I’m co-organizing with Jim Hendler for ISWC 2008. Note that this is not an official blog of my employer, Yahoo! Research. However, I will share news about my work for Yahoo! if and when the news becomes public.