New metasearch engine leaves Google, Yahoo crawling

March 25, 2009 New metasearch engine leaves Google, Yahoo crawling

Enlarge

Weiyi Meng, a professor of computer science at Binghamton University, State University of New York, is hopeful that one day in the not-too-distant future, you'll be able to type a query into an online search engine and have it deliver not Web pages that may contain an answer, but just the answer itself. Credit: Jonathan Cohen

One day in the not-too-distant future, you'll be able to type a query into an online search engine and have it deliver not Web pages that may contain an answer, but just the answer itself, says Weiyi Meng, a professor of computer science at Binghamton University, State University of New York.

For instance, imagine typing in "Who starred in the film Casablanca?" The would respond with "Humphrey Bogart and Ingrid Bergman."

Not impressed?

Try asking a more nuanced question, such as "What do Americans think of universal health care?" A search engine will create a report indicating trends in opinion based on what has been posted to the Web.

Search engines may eventually be used to conduct polling and even help sort fact from fiction, said Meng, who is helping to make such possibilities a reality, both through his research and as president of a company called Webscalers.

The way Meng sees it, big search engines such as and Yahoo are fundamentally flawed. The Web has two parts: the and the . The surface Web is made up of perhaps 60 billion pages. The deep Web, at some 900 billion pages, is about 15 times larger.

Google, which relies on a "" to examine pages and catalog them for future searches, can search about 20 billion pages. Web crawlers follow links to reach pages and often miss content that isn't linked to any other page or is in some way "hidden."

Meng, along with researchers at the University of Illinois at Chicago and the University of Louisiana at Lafayette, has helped pioneer large-scale metasearch-engine technology that harnesses the power of small search engines to come up with results that are more accurate and more complete.

"Most of the pages on the deep Web aren't directly 'crawlable.' We want to connect to small search engines and reach the deep Web," he said. "That's the idea. Many people have the that Google can search everything, and if it's not there it doesn't exist. But we should be able to retrieve many times more than what Google can search."

Not only can a metasearch engine probe deeper, it can also offer the latest information.

"In principle," Meng said, "small guys are much better able to maintain the freshness of their data. Google has a program to 'crawl' all over the world. Depending on when the crawler has last visited your server, there's a delay of days or weeks before a new page will show up in that search. We can get fresher results."

The concept is not new. In fact, the first metasearch engine was built in 1994.

"The big difference between our technology and the ones pursued by other people is that most of the other technologies do the metasearching on top of a small number of general-purpose search engines, such as Yahoo, Google or MSN," Meng explained. "We have a completely different perspective. We want to build large-scale metasearch engines on top of many small search engines."

The Web has millions of search engines at businesses, universities, newspapers and other organizations. Since 1997, and with continued funding from the National Science Foundation, Meng and his collaborators have found ways to run queries across multiple search engines and sort through the results.

Webscalers is based in the Start-Up Suite at Binghamton University's Innovative Technologies Complex, which is home to several young companies that have their roots in faculty inventions.

"If the Web keeps on growing, a company like Google may run out of resources to crawl all of those pages," said Vijay V. Raghavan, vice president of Webscalers and a faculty member at the University of Louisiana at Lafayette. "We won't have that problem. We will scale much better."

Webscalers' technology could be useful for large organizations with many divisions. For example, Webscalers has developed a prototype that would allow a search of all 64 campuses in the State University of New York system as well as SUNY's central administration.

"People can use it to find collaborators," Meng said. "It could also help prospective students find programs they're interested in."

The technology could be adapted to large companies or even the government, Meng said.

Challenges for large-scale metasearch engines include determining which search engines are the best for a given query, automating the interaction with search engines as well as organizing the search results.

Meng hopes to build a grand metasearch engine one day that would integrate all of the 1 million small search engines into a single system. "There are still a lot of significant challenges in creating a system of such magnitude," he said, "but I am optimistic that such a metasearch engine can be built."

Try out the concept online

Webscalers has already launched several metasearch products:

The first is a news metasearch engine called AllinOneNews. Available at http://www.allinonenews.com , it connects to 1,800 news sources in 200 countries. That's the largest metasearch engine in the world.

Webscalers also offers MySearchView, a system that allows any user to create his or her own metasearch engine just by checking off a few options at http://www.mysearchview.com .

Source: Binghamton University


   
Rate this story - 3.7 /5 (15 votes)

Rank Filter

Move the slider to adjust rank threshold, so that you can hide some of the comments.


Display comments: newest first

  • earls - Mar 25, 2009
    • Rank: 4 / 5 (1)
    I guess every month from here on out we'll hear about the next big "Google Killer."

    This "metasearch" however, has little do to with a typical "Google Search." It seems to be more related to the "post-processing" of the results than actually finding them.

    Airkin also made me aware in another article that it "seems to be limited to its own database (think Wiki) not like Google's large ones of the internet."

    This is evidenced by "Meng hopes to build a grand metasearch engine one day that would integrate all of the 1 million small search engines into a single system."

    It seems like a step back in my mind... Or at least, too infantile to be of any use (yet).

    Another issue that should be consider is "Should you trust the result."

    The converse of "one simple answer" is being painted as a negative: Google search returns many (too many?) results that have to be poured over to distill an answer... Though this is not really the case, as (generally) you'll get the answer you're looking for in the top 10 results.

    However, with one absolute answer, "just because the computer said," how do you know it's the correct answer? Many results gives you the ability to compare and contrast and decide for yourself what's true.

    I suppose (and understand) this is what Wolfram and Meng are attempting to accomplish... An authoritative response that falls within the human margin of error... But it just seems to me "humans are computers, and computers aren't human." Is there simply an natural disconnect between the two different "mediums" or will a singularity be reached in the future?

    I wonder what the metasearch would have to say about that question. ;) "42."
  • vlam67 - Mar 25, 2009
    • Rank: not rated yet
    yeah, sure, great Meng. Type in "Tibet" and the answer is " China's territory". Enough said.
  • ealex - Mar 26, 2009
    • Rank: not rated yet
    Wasn't there recently another one of these. What's up with that? Is there a grudge against google on the physorg team? This is basically the exact same stuff, only different search engine.

    Let it go already, we get it.
  • Choice - Mar 29, 2009
    • Rank: not rated yet
    The program should return several answers and let the asker choose the one he or she likes.
  • pcunix - Mar 29, 2009
    • Rank: not rated yet
    "Depending on when the crawler has last visited your server, there's a delay of days or weeks before a new page will show up in that search. We can get fresher results."

    Really? Gosh, I've seen pages I post show up literally minutes later.

    For this kind of stuff, I'll believe it when I see it, and I think seeing it is a long, long way off.
  • denijane - Apr 07, 2009
    • Rank: not rated yet
    As much as I like it, there is one thing that we must admit-the search the way it is,provides us with more information. For example, wanting to know something more a date, will provide you with pages and pages with related content that you have to skip trough in order to find out what you're looking for. And during this process,you learn a lot more and sometimes even stuff that are quite useful for you, but wouldn't have known otherwise. While if you got the answer in a line or 3, you would limit your knowledge.

    Yes, I know this isn't really a flaw. I just wanted to point out that all the search engines have their good and their bad sides and can develop simultaneously.

March 25, 2009 all stories

Comments: 6

3.7 /5 (15 votes)

  • hide
  • Related Stories

  • Search engine branding to be examined by researcher
    created Jun 11, 2008 | popularity not rated yet | comments 0
  • Branding matters -- even when searching
    created Jun 28, 2007 | popularity not rated yet | comments 0
  • Search engine mashup
    created Jul 06, 2007 | popularity not rated yet | comments 0
  • Clicks on sponsored links lower than previously reported but show growth potential
    created Aug 22, 2007 | popularity not rated yet | comments 0
  • Search engines return similar results for e-commerce comparison shopping
    created Feb 02, 2006 | popularity not rated yet | comments 0



  • hide
  • Relevant PhysicsForums posts

  • Computer 5V or 0V output to Sensaphone Express II
    created Feb 04, 2010
  • Ti-89 ROM Image
    created Jan 29, 2010
  • TV ads
    created Jan 29, 2010
  • Apple introduces latest iNonsense
    created Jan 27, 2010
  • cheap scientific calculator that does matrix operations
    created Jan 27, 2010
  • Power consumption: Residential vs. Commercial
    created Jan 22, 2010
  • More from Physics Forums - Computing & Technology

Other News

Spanish Minister of industry Miguel Sebastian (C) sits in an electric car with Jean Pierre Laurent

EU ministers call for common electric car strategy

Technology / Energy

created 15 minutes ago | popularity not rated yet | comments 0

EU industry ministers on Tuesday pressed the European Commission to establish a common strategy to develop electric cars.


The power of 'random'

The power of 'random': 'Seemingly loopy' technique could dramatically improve communications networks

Technology / Computer Sciences

created 8 hours ago | popularity 4.8 / 5 (5) | comments 4 | with audio podcast

A radical new approach to the design of communications networks, called "network coding," promises to make Internet file sharing faster, streaming video more reliable, and cell-phone reception better -- among ...


Android

Google developing a translator for smartphones

Technology / Software

created 9 hours ago | popularity 4.7 / 5 (6) | comments 2 | with audio podcast report

(PhysOrg.com) -- Google is developing a translator for its Android smartphones that aims to almost instantly translate from one spoken language to another during phone calls.


Security chip that does encryption in PCs hacked (AP)

Security chip that does encryption in PCs hacked

Technology / Computer Sciences

created 23 hours ago | popularity 4.6 / 5 (16) | comments 13

(AP) -- Deep inside millions of computers is a digital Fort Knox, a special chip with the locks to highly guarded secrets, including classified government reports and confidential business plans. Now a former ...


Imec and Holst Centre achieve breakthrough in battery-less radios

Imec achieves breakthrough in battery-less radios

Technology / Semiconductors

created 3 hours ago | popularity 5 / 5 (3) | comments 0 | with audio podcast

At today's International Solid State Circuit Conference, Imec and Holst Centre report a 2.4GHz/915MHz wake-up receiver which consumes only 51µW power. This record low power achievement opens the door to battery-less ...