Can the Semantic Web ideal prevail in the face of critics and rival approaches to organising information on tomorrow's Internet?
An early ambition of World Wide Web creator Tim Berners-Lee has taken a long time to come to fruition. He claimed at the TED conference in 2009 that his original plan for the Web was to make it possible to collect data together. 'Almost 20 years ago I wanted to reframe the way we work with information. Why? Basically it was frustration,' he admitted. 'People came from all over the world. They had all sorts of different data systems. For everything I had to connect to some new machine, find out about a new data format and they were all incompatible.'
Once the Web was underway and growing strongly, Berners-Lee and colleagues at the World Wide Web Consortium (W3C) developed a plan to make the Web smarter; but, 20 years after the first demonstration of the Web fired-up, we are still waiting. To echo Berners-Lee: why? W3C technical staff member Sandro Hawke says: 'The World Wide Web as we know it right now is this collection of pages. People put up the pages of stuff they want people to find.'
Hawke points to the work around a set of standards that can address the problem: 'The Semantic Web means instead of sharing the text of a page, you share data, you share facts. If you have a party you just need to state the time and what kind of event. People searching for that kind of event do not have to use keywords to find it. Semantic technology allows them to come together.'
The problem, then, for the Semantic Web is that it demands additional work on the part of the Web publisher, with the result that other technologies that do not need so much user investment have, effectively, outflanked and overtaken use of the Semantic Web in practical respects: for 'practical', read 'common commercial imperative'.
Web consultant and educator Clay Shirky, who teaches New Media at New York University's graduate Interactive Telecommunications Program (ITP), is a long-standing critic of the Semantic Web, referring to it recently as a 'witness protection programme for AI researchers' in an interview with filmmaker Kate Ray, adding: 'Instead of making machines think like people, we could describe the world in terms that machines were good at thinking about.'
For instance, Semantic Web standards such as the Resource Description Framework (RDF) were originally developed to help browsers understand what was contained on a Web page. 'This was before search engines worked very well. We thought we would have to say on each website what was on it,' says Hawke. Now natural language processing has come to the rescue via metadata.
Web of entities
Using statistics, it is possible to make the connections automatically, given a large enough body of information; and the Internet has a lot of information to go after.
Speaking at an MIT Enterprise Summit earlier this year, Evri CEO Will Hunsinger described how his company is merging search with semantics: 'The whole thing is based on natural language processing: like an army of grammar students equipped with dictionaries. As we index content we find relationships between entities. We find a Web of entities that within a certain context have a relationship. At a point of time there may be a corpus of data that shows [economist] Paul Krugman to be important. If the relationships to Paul change, then this representation will change.
'We are allowing the real-time Web to self-categorise,' is Hunsinger's boast.
Professor Ed Lazowska of the University of Washington says the shift towards automated inferencing has been made possible by 'massive amounts of computing whaling away on enormous amounts of data. People who own these huge server farms can do this processing in entirely different ways than you used to do it.'
The Combining and Uniting Business Intelligence with Semantic Technologies (CUBIST) project is similarly using large computers to analyse large bodies of text on the Internet to extract meaning.
Computers can process text when it is available, but without it, the job of discerning meaning becomes a much tougher enterprise. Tim Berners-Lee sees a renewed future for the core Semantic Web formats such as RDF, with the addition of more data to the Web. The deployment is also more gradual, using what he calls 'linked data' as a step on the road towards the Semantic Web. During his 2009 TED speech, and repeated this year, Berners-Lee asked the attendees to put their data on the Web, getting them to join in the cry 'raw data now!'. In fact, what the W3C wants is not raw data but data that shows its triples, as Hawke would put it.
Rules interchange format
Linked data exploits a couple of loopholes in the HTML and HTTP specification to make it feasible to use well-understood commands to identify data on Web sites without demanding that people get into the more complex Semantic Web protocols that were introduced - and which largely foundered - during the past decade. 'Those HTTP names we can use not just for documents, but things that those documents are about,' Berners-Lee explains. 'If I take an HTTP name and fetch data using HTTP protocol I will get data back in a useful format.'
There remains one big problem with the new-look Semantic Web: do you mean what I mean when we tag data?
The proposed answer from the W3C is the Rules Interchange Format (RIF). Hawke says even things as basic as a party could be described in different ways: 'If I publish information about my event and my neighbour publishes information about their event, we may present it in different ways. RIF lets the system handle that automatically.'
The problem is that someone has to write the RIF rules to map data between different systems. In practice, the conversion rules are likely to come from people who want to see their information used but don't want to change the formats and structures they use for the raw data. 'I would expect those rules to usually be provided by the creator of one of the vocabularies: generally the newer, 'upstart' one.'
The W3C's Sandro Hawke, meanwhile, remains optimistic about future progress toward a semantically-predicted future Web: 'Looking back in a few yeas when we have all this semantic technology deployed, the current world will seem pretty primitive.'