The more we come to rely on search engines, the more we encounter their shortcomings. With public searchers like Google lurching toward ever-more commercial business models, should organisations now consider building their own search engines? We find out.
"Everyone thinks search is typing keywords into a box," says Mike Lynch, the head of search specialist Autonomy. "But that is going to change." the science of Web searching has reached a crucial stage in its development: for proof just Google 'Web search future'. Lynch reckons that, as the mechanics of search head into the deepest recesses of the it system, it will steadily reduce our dependence on Google as a primary Web inquisitor, and make that box fade from our memories.
Lynch proposes the idea of implicit search, where the computer analyses what users are doing, and then brings up relevant information. It is a future that search-engine pioneer Karen Spärck Jones pointed to in her final lecture before her death last year. Her work, which used statistics to give computers a better idea of how language works, led her to believe that there was another layer that could sit on top of those of the operating system, the utilities, and the applications: the information layer. This would use search technologies to make the underlying it work more for the user than the reverse.
For Autonomy's Lynch, meanwhile, the future of search lies in applications such as an early-warning system for trouble and way of redressing the balance between people and their computers. "In the 1960s, people thought it would be good to get computers, but we had to make the world simple for the business - we had systems in which the position of a number told the computer what something meant," he explains. "If the number in column three, for example, went down, the system could order more parts. But if it can understand the meaning of the human input, the computer can fit to how business is done."
In this type of system, the servers might look at emails and messages coming in from customers through a helpdesk, and watch for changes in the type of problem. By determining how that traffic differs from the previous day, it may flag up that a bad batch of products has been delivered. "It might be a tiny percentage of traffic that is different," Lynch says, "but, it may contain harbingers of bigger things about to happen."
Others are more sceptical of what implicit search might do. "I suspect that implicit search may madden people more than it delights them," says Stephen Robertson of Microsoft Labs in Cambridge, who worked with Spärck Jones on the techniques that still underpin many search engines.
"Understanding what defines the current context of your task is quite hard; although I am sure there are circumstances in which it might be useful. One colleague installed an experimental system a few years ago that used implicit search. Shortly after installing it, he was replying to an email, and it brought up the reply he had sent and forgotten about."
But, Robertson maintains, the situation could be like autocorrection in word processors: "half of the time, i think it is great - and half of the time I think: 'Why did it do that?'"
Search is poised to potentially change the way that computers operate, where work on language processing in other areas seems to have failed. It's all in the mechanics of the search engine. the surprising thing about search technology is that it doesn't need to understand language. Trying to get information retrieval systems to deal with grammar and structure has sent many of them up blind alleys. Very early work on search technology led people to try to get computers to understand grammar.
"In reality, that has been a commercial failure," Lynch admits. Why? "It's because the world is very complicated."
Time and again, statistical models that do not even try to parse grammar have turned out to be far more successful. Robertson says: "one of the success stories in the last ten years has been the statistical language models. They don't try to address syntax or levels of structure; but they do reveal a level of organisation that is in the statistics."
If actions based on implicit search do become a reality, Robertson reckons statistics will play a key role: "there are certainly opportunities for systems that do implicit things on the basis of what they are observing. I am sure that statistics are vitally important in that, in the same way that statistics have become so important in our understanding of language. The people who worked on formal grammars would find it quite extraordinary.
Robertson cannot see any kind of implicit action working without any statistical ideas of what is normal and what is abnormal: "It will come from a combination of logic and statistics," he says.
The other reason for much of the work undertaken by the search industry is starkly commercial: these are the applications that are getting the R&D money right now. The advertising machine behind the public search engines is helping it to fund a battery of projects, many of which use some kind of language processing to serve up even more ads to the consumer. Google tries to show you ads that are relevant to emails or documents that you view in a browser.
However, individual companies are themselves getting in on the search game - and this may provide Google's biggest competition. Enterprise search has got off to a slow start. it means there are thousands of documents lying inside the corporate firewall that people cannot find because they do not have effective internal search facility. Says Charlie hull of Lemur Consulting: "People are asking: 'if the internet knows, then why don't we?'"
So, companies are now installing search engines - which are providing the money for the next phase of development. The technology in Microsoft's sharepoint, for example, underpins the personal search engine in the company's vista operating system. Apple Computer has made its own investment in the spotlight engine. And then, at the intranet level, the enterprise search vendors are moving in. Within the enterprise, the search box still rules but, with the ability to deploy software agents to servers and desktops, the idea of rolling implicit search into the machine becomes more feasible.
For a provider like Google to do it, you have to agree to do everything on its servers, and give the company access to what might be sensitive material. Therefore, the things that drive enterprise search are privacy and security, coupled with an incredible volume of documents.
Lemur's Charlie hull says: "People are generating documents at such a rate that, if you don't have millions of documents today, you will have tomorrow." Hull claims that people are now so used to using services such as Google to find things, they expect to be able to do the same with the information held by their own employers. And they often uncover ways to find information from public rather than private sources.
"In the old days if you wanted to look at a phone number you looked at your own phone list," says Hull, "but if your own system is slow and painful, you will use Google."
Enterprise search is about reversing that trend. To get information like phone numbers into a search engine, the software needs to understand all manner of different data formats. Companies use many different forms of file formats - some of them custom - and they want to plug search into email and messaging. Enterprise search engines need to be able to handle customisation.
"A typical large company has of the order of 9,000 information repositories," opines Lynch. "In there, you have some 400 different types of repository, and a thousand file types."
The multiplicity of data formats means that you may have to pay the search-engine supplier to develop the code needed to index the information in your custom files and databases or use an application programming interface (API) to do the job yourself. This is where people such as hull believe open source could have an advantage.
Potentially, you can build much tighter links between your data and the search engine and also benefit from the converters that people have developed for various data formats that have been passed back into the community. Lynch sees an opening for open source among smaller companies "provided you like programming".
A common theme among enterprise search users is the need for security. Most enterprise search tools on the market are able to hide documents from users who do not have the right access privileges. This is not something you can get from public search engines.
A further incentive for getting users onto the internal search system comes from the evolution of enterprise search engines to support federated search. Users often call for the ability to search both inside and outside the company. This entails performing federated searches or using metasearch engines - tools that collate results from many different search products.
The federated search is one that may make the public search engines less important as it allows search engines to cooperate on a task. The problem here is one of standardisation: that is moving slowly, but there is an active specification, called OpenSearch. If lots of little search engines are able to work together, users may slip away from the monolithic services provided by Ask, Google, and Microsoft.