Archive for February, 2008|Monthly archive page

Building on top of microsearch

David Karger commented on my blog, suggesting a combination of microsearch and Simile Exhibit. A great idea, I’ll follow up on that in a minute. I’m sure many of you have ideas of your own for other mashups that require searching for metadata. For this reason I just wanted to bring attention to the littleSW cube logo right after the text “Search Results”, because it does an important piece of magic: it exposes all the metadata that has been gathered from the result pages in RDF/XML. (The result pages include those displayed plus more.) This is the key to start building on microsearch.

Granted, what you build might be slow. But by all means, please get in touch… we will find a way to make things work faster for you 😉



Inside Yahoo I’m often the first one to tell people about the wonders of the Semantic Web. Typically, one of the first questions I get is: “but just how much of this metadata is out there?” And more often than I like to admit, my answer is not better than “tons”.

Why is this question difficult to answer? First, we don’t have a quantitative result: the crawls of existing Semantic Web search engines are likely to be partial, and certainly don’t include the bulk of metadata that is embedded inside webpages using microformats and RDFa. Second, the questioner most likely would like a qualitative answer: how much metadata is out there that people would actually care about?

And so the idea of microsearch was born. The idea was to create a search page that instead of hiding metadata, brings it to the front, thereby showing the user just how much metadata is out there for any given query. After a few months of hacking and tweaking and some more time spent on getting permissions to share with the world, I’m now happy to say that our demo is finally online, and you can try microsearch for yourself. Here are some examples: 1,2,3,4 and a screenshot for the first example:

microsearch screenshot

Here is how it works in brief. You type in a query (say, your name). We gather the search results from our search engine and strip the metadata, including microformats of three known flavors (hCard, hCalendar, hReview), linked RDF and RDFa. The metadata is shown inside the abstracts as well as on a map and a timeline.

The map and the timeline work as aggregators, for example, in the third example above we show all events related to the query “san francisco conferences” on a single timeline. Further, in case pages can be related through the metadata we group them together. You can see this on the first example (shown on the screenshot). For the technologically minded, I’m using Java Servlets/JSP, Sesame, Elmo, the Simile Fresnel API, the Simile Timeline, Yahoo! Ajax Maps API and of course Yahoo! Search technology.

This is hardly the end of this story, but we are really curious about your feedback. So respond, comment here or drop me an email!

Disclaimer: this is not an indication if or how microformats and other forms of metadata would feature in Yahoo! products or services in the future. Also, it is a research prototype: use it long enough and it will break.

Billion Triples Challenge Intro

At the beginning of December, Jim and I have sent the following email to SW related mailing lists around the World, introducing the idea of this Challenge. The reaction we got was immense: we had close to 50 people subscribing to the mailing list within a 24 hour time-frame, with currently more than 90 people tuned in. If you are interested, please join as well.
This is the first public pre-announcement of the Open Web, Billion Triples Challenge, which will be organized as a special track of next year’s Semantic Web Challenge. This track will be in addition to the traditional SWC competition and it will focus on pushing the limits in tool design on the fronts of scalability in size and robustness in the face of data typically found on the Web. The goal of the competition is also to generate new application ideas, i.e. to show what is possible with Web metadata today.
The details of this Challenge are yet to be determined and we are calling on the Semantic Web community to help us in its formation. For this purpose we have set up a mailing list [2] and would like to invite everyone interested in this new Challenge to join this list. The mailing list will serve to discuss the data sets and rules of competition, and later to disseminate all other information regarding the new Challenge.
Best Regards,
The co-chairs of the SWC:
Jim Hendler (Rensselaer Polytechnic Institute)
Peter Mika (Yahoo! Research)

A blog

I’ve started a blog to communicate my thoughts and ideas about the Semantic Web and discuss current events such as the Billion Triples Challenge I’m co-organizing with Jim Hendler for ISWC 2008. Note that this is not an official blog of my employer, Yahoo! Research. However, I will share news about my work for Yahoo! if and when the news becomes public.