Search engines that learn from searchers

January 8, 2010 By Bill Steele

(PhysOrg.com) -- New research aims to create search-engine software that can learn from users by noticing which links they click and how they reformulate their queries when the first results don't pay off.

You probably won't notice, but in the near future some search engines may start experimenting on you. It should be worthwhile, since the goal is to better understand what you were looking for and give you the best possible answers.

New research by Thorsten Joachims and Robert Kleinberg, associate and assistant professors of computer science, respectively, aims to create software that can learn from users by noticing which links they click on in a list of search responses, and how they reformulate their queries when the first results don't pay off.

The work is funded by a four-year, $1 million grant from the National Science Foundation under federal stimulus funding, formally known as the American Recovery and Reinvestment Act (ARRA). The research will lead to methods that improve search quality without human guidance, especially on specialized Web sites such as scientific or legal collections or corporate intranets.

Joachims believes the work will have long-term benefits for the economy, invigorating the market for high-quality and focused search software. "I think there is a potential for commercial impact, improving quality and productivity," he said. In the short term, the project will fund at least two Ph.D. students for 4 years, and provide research positions for undergraduate students.

As a demonstration, the researchers plan to create a new search engine for the physics arXiv Web site at Cornell, which contains thousands of papers in physics, mathematics and computer science, and possibly for other specialized collections.

"In several ways, providing search for small collections is more difficult than for the whole Internet. , Yahoo! and Microsoft can spend a lot of manpower on engineering a good ranking function for the Internet. For small collections, this has to happen automatically via machine learning to be economical." Joachims explained.

Search is not a one-size-fits-all business: People searching specialized collections might use the same words in very different ways. Is "uncertainty," for example, about the location of subatomic particles, career choices, investment opportunities or romance?

"The key idea is have a search engine that gets better just by people using it," Joachims said. He and his collaborators have already created a search engine called Osmot -- the name is a play on "learning by osmosis" -- which draws on extensive research by computer scientists in machine learning. The problem the new research will address is that what the machine learns may be biased by the way it displays results.

Eye-tracking studies done in cooperation with Geri Gay, the Kenneth J. Bissett Professor and Chair of Communication, have shown that absence of a click on a result at, say, the 11th position on the list of returns may mean that the result did not fit the user's information need, but it may also mean that the user had given up scanning the list that far down. To get reliable feedback from clicks, the search engine needs to shuffle the order in which results are returned.

"There is a trade-off. On the one hand, you want to present the best ranking you know so far," Joachims explained. "On the other hand, the search engine has to do a bit of experimentation to be able to learn even better rankings in the long run. The key is to balance the tradeoff between presentation and experimentation in an optimal way."

This trade-off is similar to what a gambler faces in a casino and is called a "multi-armed bandit" problem. When playing a row of slot machines, each play gives you new information about how much that machine pays, but also costs you a quarter. The trick is to eliminate some machines when you're sure they won't pay off without spending more than necessary. Kleinberg's work on algorithms for solving such trade-off problems will be key to making search engines learn effectively.

Osmot is open-source software but still very much in beta. More information can be found at http://learnimplic … oachims.org/ .

Provided by Cornell University (news : web)

4.8 /5 (4 votes)  

Filter


Move the slider to adjust rank threshold, so that you can hide some of the comments.


Display comments: newest first

FreakTrap
Jan 08, 2010

Rank: not rated yet
These concepts have been around for quite some time, I'm sure the major search providers have been researching this at least for the past decade. There is also a major vulnerability in the methodology: instruct a botnet to emulate a conventional user and have them use relevant keywords to find and end their session on a specific document. To the search engine, a spoofed document's popularity is indistinguishable from an organic searchers' actual satisfaction metric.
designmemetic
Jan 08, 2010

Rank: not rated yet
I'd love a nice open source algorythm that could be shared and used by all. A general search optimization function for all sorts of things could be handy.
Rank 4.8 /5 (4 votes)
Related Stories
Relevant PhysicsForums posts

More news stories

Google users warned of threat to smartphone wallets

Users of Google smartphone wallets were being warned on Friday that there is a way to crack pass codes intended to thwart thieves from going on illicit shopping sprees.

Technology / Internet

created 11 minutes ago | popularity not rated yet | comments 0

CIA website offline, Anonymous takes credit

The website of the Central Intelligence Agency was unresponsive on Friday after the hacker group Anonymous claimed to have knocked it offline.

Technology / Internet

created 1 hour ago | popularity 5 / 5 (2) | comments 7

New error-correcting codes guarantee the fastest possible rate of data transmission

Error-correcting codes are one of the triumphs of the digital age. They’re a way of encoding information so that it can be transmitted across a communication channel — such as an optical fiber o ...

Technology / Computer Sciences

created 10 hours ago | popularity 5 / 5 (3) | comments 5 | with audio podcast

Small modular reactor design could be a 'SUPERSTAR'

(PhysOrg.com) -- Though most of today's nuclear reactors are cooled by water, we've long known that there are alternatives; in fact, the world's first nuclear-powered electricity in 1951 came from a reactor ...

Technology / Energy & Green Tech

created 9 hours ago | popularity 4.2 / 5 (10) | comments 19 | with audio podcast

Advanced power-grid model finds low-cost, low-carbon future in West

(PhysOrg.com) -- The least expensive way for the Western U.S. to reduce greenhouse gas emissions enough to help prevent the worst consequences of global warming is to replace coal with renewable and other ...

Technology / Energy & Green Tech

created 9 hours ago | popularity 3.7 / 5 (3) | comments 7 | with audio podcast


NASA sees wide-eyed cyclone Jasmine

Cyclone Jasmine's eye has opened wider on NASA satellite imagery, as it moves through the Southern Pacific Ocean.

NASA sees Giovanna reach cyclone strength, threaten Madagascar

Tropical Storm 12S built up steam and became a cyclone on February 10, 2012 as NASA's Terra satellite passed overhead. Residents of east-central Madagascar should prepare for this cyclone to make landfall ...

Complex wiring of the nervous system may rely on a just a handful of genes and proteins

Researchers at the Salk Institute have discovered a startling feature of early brain development that helps to explain how complex neuron wiring patterns are programmed using just a handful of critical genes. ...

The power of estrogen -- male snakes attract other males

A new study has shown that boosting the estrogen levels of male garter snakes causes them to secrete the same pheromones that females use to attract suitors, and turned the males into just about the sexiest ...

Humans may have helped the decline of African rainforests 3000 years ago

(PhysOrg.com) -- Large areas of rainforests in Central Africa mysteriously disappeared over three thousand years ago, to be replaced by savannas. The prevailing theory has been that the cause was a change ...

Could Venus be shifting gear?

(PhysOrg.com) -- ESA’s Venus Express spacecraft has discovered that our cloud-covered neighbour spins a little slower than previously measured. Peering through the dense atmosphere in the infrared, the ...