Archive for the 'semanticweb' Category

SearchMonkey and SemTech

If you are arriving to SemTech early this year or you happen to live in the Bay Area, come by the Yahoo! campus for the SearchMonkey launch party on May 15 (Thursday): you will have a chance to learn about the Monkey, talk to developers and product managers, and enjoy some free food. Be there or be a… banana.

I’ll be there in my semi-official position as Data Architect for SearchMonkey, and you will also be able to catch me at SemTech, where I’ll be both presenting under the title Making the Web Searchable and participating in a panel on Giving Web Search a Face-lift.

Looking forward to seeing many of my friends there!

Yahoo! and the Semantic Web

I wanted to share with you the excitement of Yahoo’s announcement today of our support for the Semantic Web effort through the SearchMonkey project. More details in this Yahoo! Search blog article and in reports at TechCrunch, ReadWriteWeb and SearchEngineLand.
Exciting times for me personally, for Yahoo! and for the Semantic Web at large!

Building on top of microsearch

David Karger commented on my blog, suggesting a combination of microsearch and Simile Exhibit. A great idea, I’ll follow up on that in a minute. I’m sure many of you have ideas of your own for other mashups that require searching for metadata. For this reason I just wanted to bring attention to the littleSW cube logo right after the text “Search Results”, because it does an important piece of magic: it exposes all the metadata that has been gathered from the result pages in RDF/XML. (The result pages include those displayed plus more.) This is the key to start building on microsearch.

Granted, what you build might be slow. But by all means, please get in touch… we will find a way to make things work faster for you ;)

microsearch

Inside Yahoo I’m often the first one to tell people about the wonders of the Semantic Web. Typically, one of the first questions I get is: “but just how much of this metadata is out there?” And more often than I like to admit, my answer is not better than “tons”.

Why is this question difficult to answer? First, we don’t have a quantitative result: the crawls of existing Semantic Web search engines are likely to be partial, and certainly don’t include the bulk of metadata that is embedded inside webpages using microformats and RDFa. Second, the questioner most likely would like a qualitative answer: how much metadata is out there that people would actually care about?

And so the idea of microsearch was born. The idea was to create a search page that instead of hiding metadata, brings it to the front, thereby showing the user just how much metadata is out there for any given query. After a few months of hacking and tweaking and some more time spent on getting permissions to share with the world, I’m now happy to say that our demo is finally online, and you can try microsearch for yourself. Here are some examples: 1,2,3,4 and a screenshot for the first example:

microsearch screenshot

Here is how it works in brief. You type in a query (say, your name). We gather the search results from our search engine and strip the metadata, including microformats of three known flavors (hCard, hCalendar, hReview), linked RDF and RDFa. The metadata is shown inside the abstracts as well as on a map and a timeline.

The map and the timeline work as aggregators, for example, in the third example above we show all events related to the query “san francisco conferences” on a single timeline. Further, in case pages can be related through the metadata we group them together. You can see this on the first example (shown on the screenshot). For the technologically minded, I’m using Java Servlets/JSP, Sesame, Elmo, the Simile Fresnel API, the Simile Timeline, Yahoo! Ajax Maps API and of course Yahoo! Search technology.

This is hardly the end of this story, but we are really curious about your feedback. So respond, comment here or drop me an email!

Disclaimer: this is not an indication if or how microformats and other forms of metadata would feature in Yahoo! products or services in the future. Also, it is a research prototype: use it long enough and it will break.

Billion Triples Challenge Intro

At the beginning of December, Jim and I have sent the following email to SW related mailing lists around the World, introducing the idea of this Challenge. The reaction we got was immense: we had close to 50 people subscribing to the mailing list within a 24 hour time-frame, with currently more than 90 people tuned in. If you are interested, please join as well.
This is the first public pre-announcement of the Open Web, Billion Triples Challenge, which will be organized as a special track of next year’s Semantic Web Challenge. This track will be in addition to the traditional SWC competition and it will focus on pushing the limits in tool design on the fronts of scalability in size and robustness in the face of data typically found on the Web. The goal of the competition is also to generate new application ideas, i.e. to show what is possible with Web metadata today.
The details of this Challenge are yet to be determined and we are calling on the Semantic Web community to help us in its formation. For this purpose we have set up a mailing list [2] and would like to invite everyone interested in this new Challenge to join this list. The mailing list will serve to discuss the data sets and rules of competition, and later to disseminate all other information regarding the new Challenge.
Best Regards,
The co-chairs of the SWC:
Jim Hendler (Rensselaer Polytechnic Institute)
Peter Mika (Yahoo! Research)