Google search engine

New natural language processing technologies could improve search engines

Image credit: Dreamstime

Researchers at the University of Texas (UT) Austin have been making use of supercomputer facilities to combine machine and human intelligence in order to best understand free text and perhaps improve the search engines of the future.

The algorithms behind search engines are fed billions of texts with linguistic connections to learn to interpret the relationship between our search terms and possible web pages. Human “annotators” work to fine-tune the algorithm by choosing the best search results.

This model has served billions of internet users well, but it has some serious flaws. Search engines lack a fundamental understanding of logic and language and tend to duplicate and reinforce the biases in our searches.

In order to improve this model, UT Austin researchers began to investigate ways in which to combine human and machine intelligence to improve general and specialise search engines.

A study led by An Nyugen, a PhD student at UT Austin, described a method combining input from multiple annotators to determine the best overall results. The UT Austin team tested this method using supercomputing resources at the Texas Advanced Computer Centre to analyse medical research papers for keywords and to recognise events, people and places in breaking news stories. It proved to be a more accurate approach for extracting useful information from texts.

“An important challenge in natural-language processing is accurately finding important information contained in free-text, which lets us extract into databases and combine it with other data in order to make more intelligent decisions and new discoveries,” said Professor Matthew Lease.

“We’ve been using crowdsourcing to annotate medical and news articles at scale so that our intelligent systems will be able to more accurately find the key information contained in each article.”

Another paper, written by Ye Zhang, a UT Austin PhD student, suggests incorporating existing resources – such as WordNext, a database which groups words into sets of synonyms – which stores knowledge about a given field. Using these resources can allow for similar words to be accounted for, increasing the efficiency of the neural network.

“We had this idea that if you could somehow reason about some words being related to other words a priori, then instead of having to have a parameter for each one of those words separately, you could tie together the parameters across multiple words and in that way need less data to learn the model,” said Professor Lease.

“It could realise the benefits of deep learning without large data constraints.”

The researchers applied this method to a sentiment analysis of film reviews and a search of academic papers relating to anaemia. They found that performance was significantly improved. The researchers hope that these new approaches to natural-language processing could help refine web search engine results in the future.

“Industry is great at looking at near-term things, but they don’t have the same freedom as academic researchers to pursue research ideas that are higher risk but could be more transformative in the long term,” said Professor Lease.

Sign up to the E&T News e-mail to get great stories like this delivered to your inbox every day.

Recent articles