Archive for the ‘semanticweb’ Category
VoCamps provide the missing social interaction needed for vocabulary creation and management on the Semantic Web: a space where members of the community can discuss the current issues related to vocabularies and semantic interoperability. Unlike Semantic Web meetups which typically take just a few hours and where the discussion focuses on a single presentation, VoCamps are two-day events that allow in-depth discussions and working in small groups.
Following the success of VoCamp Ibiza, we are organizing another similar event at Yahoo, but this time in the US, where VoCamps are now also taking hold. (VoCampDC will be organized at the end of May and has already reached it’s full capacity!) This VoCamp will take place in Sunnyvale, directly after the SemTech 2009 conference.
If you would like to join this next edition of the VoCamp series, please sign up on the VoCampSunnyvale2009 wiki page! The space is limited, but we will try to expand if necessary. Hope to see many of you in San Jose and Sunnyvale!
I’m thrilled to spread the news further that we have just made available all public metadata (microformats, eRDF and RDFa data) that we crawl through the BOSS API. See the official announcement and commentaries throughout the Web.
This means essentially opening up all public data available to SearchMonkey applications, and thus making it possible for anyone to experiment with various forms of semantic search that go well beyond changing the way abstracts look. Consider for example the microsearch prototype I blogged about a few months ago, which showed rich abstracts, plus temporal and geographic visualizations based on metadata. You can now build something like microsearch in a matter of hours, and in a highly scalable fashion. Last, but not least, the terms of the BOSS API allows you to monetize your search engine in any way you want.
So if you think you can build a better search engine through semantics, this is a great time to start! All we ask is to give us feedback on what you do, minimally by tagging your experiment with the tag ‘bossmashup’.
We are only two days away from the first VoCamp. Although I can not make it — I’m attending the Dagstuhl seminar on the Social Web –, I’m incredibly excited that we in the Semantic Web community are finally starting to talk about the situation around vocabularies. For this reason, I decided to share some thoughts on the topic, as an input to the discussion. My personal opinion, but based on what I see around me at work.
So what’s wrong with vocabularies on the Semantic Web? Well, anno 2008 we can finally state with confidence that the approach we followed so far (of having no particular approach) did not produce the results we expected. Here is a list of things that would need urgent fixing, in no particular order:
- Centralize. Unlike with microformats, there is no single point on the Semantic Web to go for vocabularies, to discuss them, to share use cases and examples. Decentralization left us with no way to tell what is out there and how trustworthy or useful it is. The lack of options leads to abuse. If a user doesn’t find the movie vocabulary, he will still create a movie class in the rdfs namespace. And yes, there is plenty of rdfs:movie out there.
- Release your vocabularies. There are many talented people in our community who have created the first vocabulary for topic X only to abandon it a few month later when moving on to the next project. I have nothing against personomies but plan for succession in case the time comes that you cannot maintain it any more. If you’ve done your job well, there will be plenty of people willing to carry on your work. However, we cannot maintain your vocabulary for you without your cooperation. I like PURLs.
- Embrace microformats. Microformats are just another way of providing metadata. We need good RDF/OWL vocabularies for at least the most widely used microformats. These should be straightforward representations with a one-to-one mapping between elements in the microformat and in the vocabulary, and not a grand redesign. I know it’s tempting to redesign a microformat but it’s important to resist that temptation.
- Design vocabularies for annotations. Now that RDFa is out, most of us are facing some mild inconvenience of using existing ontologies for marking up web pages. Existing ontologies will need some adaptation because they were designed from a consumer perspective, not from a producer perspective. This will likely mean that we will have to compromise, on at least some occasions. For example, in social sites it’s common to expose only the user’s age and number of friends (for which FOAF has no properties), not the birthday and the list of friends (for which it does). Principles of ontology engineering may dictate that we shouldn’t introduce time-bound properties or properties for counts of things, or multiple properties for saying the same things in different ways. But we will just have to do it, or else it will be rdfs:age.
- Clarify the role of W3C in the process. As people find vocabularies by searching, they often run into ontologies that appear in the form of W3C member submissions. These are easily findable (good pagerank) and the formatting and placement gives them an official disguise. Problem is, anyone (who is a member, that is…) can author such a submission, which as evidence shows will not be updated nor removed over time. This is the way we got to have two ontologies for VCard with documentation and URIs in the W3C domain (1, 2). In my personal opinion this is doing more harm than good.
This is certainly not the complete list, but I hope that at least some of these points will come up in Oxford. And I’m sorry guys I can not make it. Have fun!
Submissions are now open for the Semantic Web Challenge 2008! As already announced, this year we are inviting submissions for both the Open Track (the ‘classic’ Challenge) and the Billion Triples Challenge, a newly created track with the focus on scalability. For more details, see the call for the Challenge.
Submit your application through the online submission system by October 1, 2008.
I begin with some argumentation as to why we need semantics in search. I talk about what I see as an interesting complementarity between recent advances in automated semantic tagging of natural text and other approaches to bringing metadata to the Web (such as exposing databases and APIs as RDF). Lastly, I went into a bit of detail as to how Yahoo’s SearchMonkey has been benefiting from Semantic Web technology in what we’ve been doing up to now, and the ways in which the widening adoption of semantic technology is driving us towards simplification.
p.s. Also on DevX: an introduction to OpenCalais and SearchMonkey by James Leigh.
If you are arriving to SemTech early this year or you happen to live in the Bay Area, come by the Yahoo! campus for the SearchMonkey launch party on May 15 (Thursday): you will have a chance to learn about the Monkey, talk to developers and product managers, and enjoy some free food. Be there or be a… banana.
I’ll be there in my semi-official position as Data Architect for SearchMonkey, and you will also be able to catch me at SemTech, where I’ll be both presenting under the title Making the Web Searchable and participating in a panel on Giving Web Search a Face-lift.
Looking forward to seeing many of my friends there!
I wanted to share with you the excitement of Yahoo’s announcement today of our support for the Semantic Web effort through the SearchMonkey project. More details in this Yahoo! Search blog article and in reports at TechCrunch, ReadWriteWeb and SearchEngineLand.
Exciting times for me personally, for Yahoo! and for the Semantic Web at large!
David Karger commented on my blog, suggesting a combination of microsearch and Simile Exhibit. A great idea, I’ll follow up on that in a minute. I’m sure many of you have ideas of your own for other mashups that require searching for metadata. For this reason I just wanted to bring attention to the little logo right after the text “Search Results”, because it does an important piece of magic: it exposes all the metadata that has been gathered from the result pages in RDF/XML. (The result pages include those displayed plus more.) This is the key to start building on microsearch.
Granted, what you build might be slow. But by all means, please get in touch… we will find a way to make things work faster for you 😉