Carnegie Mellon algorithm identifies top 100 blogs for news

November 19, 2007

Being among the first to pick up on Internet news and gossip and rapidly detecting contamination anywhere in a water supply system are similar problems, at least from a computer scientist’s point of view. Both can be solved with a versatile algorithm developed by Carnegie Mellon University researchers.

Using a problem-solving method called the Cascades algorithm, Carlos Guestrin, assistant professor of computer science and machine learning, and his students compiled a list of the best 100 blogs to read to find the biggest news on the Web as early as possible, http://www.blogcascades.org/ . It includes well-known blogs, such as Instapundit and Boing Boing, but also some more obscure ones like Watcher of Weasels and Don Surber.

“The goal of our system when looking at blogs is to detect the big stories as early on and as close to the source as possible,” Guestrin said. He, Andreas Krause and Jure Leskovec, doctoral students in computer science and machine learning, respectively, analyzed 45,000 blogs (those that actively link to other blogs) to compile the list, checking the time stamps to determine where news items were being posted first.

But reading even 100 blogs, many of them with numerous postings, may be more than many Web surfers can handle. Recasting the problem, the researchers used their algorithm to compile a list of blogs if a person wanted to read only 5,000 postings. This list is quite different, with “summarizer” blogs, such as The Modulator and Anglican predominating.

Similarly, Guestrin and his students used the same algorithm to determine the optimal number and placement of sensors for detecting the introduction and spread of contaminants in a municipal water supply. Their report on the blog and water system case studies, “Cost-Effective Outbreak Detection in Networks,” was presented at the Association for Computing Machinery’s International Conference on Knowledge Discovery and Data Mining earlier this year.

“Nothing demonstrates the versatility of Carlos’ algorithm better than its ability to solve these two difficult and seemingly different problems,” said Randal E. Bryant, dean of Carnegie Mellon’s School of Computer Science. “It’s a credit to Carlos’ insight and inventiveness, but also a testament to the power of computational thinking. Computer scientists increasingly are developing common methods for solving problems that apply across any number of disciplines.”

Guestrin began work on the Cascades algorithm in 2004 to find a way to balance the cost of collecting information with the need for collecting the information early and close to its source. Initially, this addressed problems in designing wireless sensor networks — a technology that potentially can monitor such important conditions as water quality, building temperature, vital signs of nursing home residents, algal blooms in lakes and the structural integrity of bridges. In all of these cases, deploying the wrong number of sensors or putting them in the wrong places wastes money and produces poor information.

The algorithm allows for near-optimal placement of sensors by exploiting a property called submodularity. Simply put, submodularity means there is a diminishing return associated with adding sensors — adding a sensor to a five-sensor network has much more impact than adding a sensor to a 10,000-sensor network. The algorithm also takes into account the property of locality — the idea that sensors that are far apart provide almost independent information.

Work by Guestrin and his group is now focusing on detecting pollution in lakes and rivers and ensuring performance quality on citywide Wi-Fi networks. “This project represents a nice blend of theoretical understanding and a lot of engineering effort to make the whole thing work,” he said. “It’s a nice theory applied to larger, real-world data. It’s cross fertilization and interdisciplinary thinking in the true Carnegie Mellon tradition.”

Work on developing the Cascade algorithm has been supported by the National Science Foundation, Intel, Microsoft, the Sloan Foundation, PITA, IBM and Hewlett-Packard.

Source: Carnegie Mellon University


print this article email this article download pdf blog this article bookmark this article     Stumble it Digg this share on Facebook retweet share on Reddit add to delicious
Rate this story - 4.1 /5 (10 votes)


November 19, 2007 all stories

Comments: 0

4.1 /5 (10 votes)
  • Stumble this up

  • Digg this

  • share this

  • hide
  • Related Stories

  • 'Lipstick on a pig' -- tracking the life and death of news
    created Jul 13, 2009 | popularity not rated yet | comments 0
  • Personal discrimination on the Web
    created May 21, 2009 | popularity not rated yet | comments 0
  • The newest AI computing tool: people
    created Jun 28, 2007 | popularity not rated yet | comments 0
  • High spam levels choke business broadband
    created Jul 18, 2006 | popularity not rated yet | comments 0
  • New analysis of networks reveals surprise patterns in politics
    created May 24, 2006 | popularity not rated yet | comments 0



  • hide
  • Relevant PhysicsForums posts

  • Help with a camera choice
    created Nov 18, 2009
  • casio calculator that's similar to TI-89
    created Nov 08, 2009
  • Advice on what cell phone to get
    created Nov 08, 2009
  • Changing the language options on your phone.
    created Nov 03, 2009
  • More from Physics Forums - Computing & Technology

Other News

China is the world's largest emitter of the greenhouse gases blamed for global warming

China harnesses mountain wind power

Technology / Energy

created 8 hours ago | popularity 4 / 5 (6) | comments 0

In the mountains above the southwestern Chinese town of Dali, dozens of new wind turbines dot the landscape -- a symbol of the country's sky-high ambitions for clean, green energy.


Hackers leak e-mails, stoke climate debate

Technology / Internet

created 20 hours ago | popularity 4.5 / 5 (25) | comments 19

(AP) -- Computer hackers have broken into a server at a well-respected climate change research center in Britain and posted hundreds of private e-mails and documents online - stoking debate over whether some scientists have ...


Analysts say AmEx is most interested in the so-called peer-to-peer services of Revolution

American Express takes aim at PayPal with Revolution

Technology / Internet

created 5 hours ago | popularity not rated yet | comments 0

With its deal to buy Revolution Money, American Express is taking aim at the growing market for online and alternative payments, in a challenge to recognized leader PayPal, analysts say.


Ubisoft steps up videogame fitness with virtual coach

Technology / Software

created 9 hours ago | popularity not rated yet | comments 0

French videogame powerhouse Ubisoft will have a virtual fitness coach whipping Wii users into shape starting Tuesday.


plug-in hybrid electric vehicle

Pulling the plug on hybrid myths

Technology / Energy

created Nov 19, 2009 | popularity 3.8 / 5 (14) | comments 20

(PhysOrg.com) -- Whether you call them myths, urban legends, fables or old wives' tales, there's a lot of misinformation out there about plug-in electric hybrid vehicles. These vehicles, abbreviated PHEVs, ...