Archive for the ‘microformats’ Category
I’m thrilled to spread the news further that we have just made available all public metadata (microformats, eRDF and RDFa data) that we crawl through the BOSS API. See the official announcement and commentaries throughout the Web.
This means essentially opening up all public data available to SearchMonkey applications, and thus making it possible for anyone to experiment with various forms of semantic search that go well beyond changing the way abstracts look. Consider for example the microsearch prototype I blogged about a few months ago, which showed rich abstracts, plus temporal and geographic visualizations based on metadata. You can now build something like microsearch in a matter of hours, and in a highly scalable fashion. Last, but not least, the terms of the BOSS API allows you to monetize your search engine in any way you want.
So if you think you can build a better search engine through semantics, this is a great time to start! All we ask is to give us feedback on what you do, minimally by tagging your experiment with the tag ‘bossmashup’.
David Karger commented on my blog, suggesting a combination of microsearch and Simile Exhibit. A great idea, I’ll follow up on that in a minute. I’m sure many of you have ideas of your own for other mashups that require searching for metadata. For this reason I just wanted to bring attention to the little logo right after the text “Search Results”, because it does an important piece of magic: it exposes all the metadata that has been gathered from the result pages in RDF/XML. (The result pages include those displayed plus more.) This is the key to start building on microsearch.
Granted, what you build might be slow. But by all means, please get in touch… we will find a way to make things work faster for you
Inside Yahoo I’m often the first one to tell people about the wonders of the Semantic Web. Typically, one of the first questions I get is: “but just how much of this metadata is out there?” And more often than I like to admit, my answer is not better than “tons”.
Why is this question difficult to answer? First, we don’t have a quantitative result: the crawls of existing Semantic Web search engines are likely to be partial, and certainly don’t include the bulk of metadata that is embedded inside webpages using microformats and RDFa. Second, the questioner most likely would like a qualitative answer: how much metadata is out there that people would actually care about?
And so the idea of microsearch was born. The idea was to create a search page that instead of hiding metadata, brings it to the front, thereby showing the user just how much metadata is out there for any given query. After a few months of hacking and tweaking and some more time spent on getting permissions to share with the world, I’m now happy to say that our demo is finally online, and you can try microsearch for yourself. Here are some examples: 1,2,3,4 and a screenshot for the first example:
Here is how it works in brief. You type in a query (say, your name). We gather the search results from our search engine and strip the metadata, including microformats of three known flavors (hCard, hCalendar, hReview), linked RDF and RDFa. The metadata is shown inside the abstracts as well as on a map and a timeline.
The map and the timeline work as aggregators, for example, in the third example above we show all events related to the query “san francisco conferences” on a single timeline. Further, in case pages can be related through the metadata we group them together. You can see this on the first example (shown on the screenshot). For the technologically minded, I’m using Java Servlets/JSP, Sesame, Elmo, the Simile Fresnel API, the Simile Timeline, Yahoo! Ajax Maps API and of course Yahoo! Search technology.
This is hardly the end of this story, but we are really curious about your feedback. So respond, comment here or drop me an email!
Disclaimer: this is not an indication if or how microformats and other forms of metadata would feature in Yahoo! products or services in the future. Also, it is a research prototype: use it long enough and it will break.