Semantic Web: Publishers vs. Consumers
Semantic Web: Publishers vs. Consumers
This is the 2nd entry about the semantic web. The Semantic Web series compares the <alt> Semantic Web implementation with previous theories about how to implement the vision.
The authoring of meta data is a fundamental difference between the <alt> Semantic Web implementation and the Semantic Web standards.
Most likely due to the origins of the HTML, meta data in the traditional Semantic Web is to be created by the page author. Since she authored the page, she knows what's important. It is common knowledge that the META element has been abused to the point of irrelevance. So what about the newer attempts with embedded RDF or RDFa?
Same thing. You can't trust the page author to embed anything more than what generates ad dollars or installs malicious code. And, as the search engine vendors have definitively stated: "We deal with millions of Web masters who can't configure a server, can't write HTML. It's hard for them to go to the next step..."
It's actually even worse than that. As a tool vendor, we found out that even when XML is the only allowed markup-- as it is in the wireless world of WAP-- developers are not disciplined enough to learn the vocabulary/tags or even validate their content. This was a major reason for WML's demise.
Additionally, the newer Semantic Web standards such as RDFa or GRDDL are predicated on authoring XHTML documents. Two things should immediately jump out at you: first, XHTML is not supported by 80% of web browsers. And second, more than 99.9% of current web pages are not XHTML. They're written in HTML -- or broken XHTML-- and that's not going to stop for the foreseeable future.
The other problem is that many web tools and servers will mangle your perfectly formed XML/WML/XHTML. And finally, these extensions will cause your XHTML to not validate.
We believe that to implement the Semantic Web on an internet scale-- and not just in isolated domains-- we have to enable HTML content to to be used.
That precludes using XHTML and creator authored meta data. If it exist within the document and it's a credible site such as the BBC, etc well then it is one input of meta data.
So where do we get the meta data? In our implementation of the Semantic Web, we enable users to create, own, and share their meta data about a page. Just like users have successfully created and shared their own pictures, videos, and blogs.
And why not? If a reader doesn't find value in a page, then it's unimportant to them. But maybe someone else will find merit in a page. It is impossible for the author to divine everything that's important and then describe it as each user might want to consume it.
The only realistic solution is to provide users with the tools to manage their own meta data.
It's a Web 2.0 world: technologies should empower the user rather than providing greater control to the publisher.
Thursday, March 27, 2008