Brown mathematicians prove new way to build a better estimate

February 29, 2008

How do you sift through hundreds of billions of bits of information and make accurate inferences from such gargantuan sets of data? Brown University mathematician Charles “Chip” Lawrence and graduate student Luis Carvalho have arrived at a fresh answer with broad applications in science, technology and business.

In new work published in the Proceedings of the National Academy of Sciences, Lawrence and Carvalho describe a new class of statistical estimators and prove four theorems concerning their properties. Their work shows that these “centroid” estimators allow for better statistical predictions – and, as a result, better ways to extract information from the immense data sets used in computational biology, information technology, banking and finance, medicine and engineering.

“What’s exciting about this work – what makes it every scientist’s dream – is that it’s so fundamental,” Lawrence said. “These new estimators have applications in biology and beyond and they advance a statistical method that’s been around for decades.”

For more than 80 years, one of the most common methods of statistical prediction has been maximum likelihood estimation (MLE). This method is used to find the single most probable solution, or estimate, from a set of data.

But new technologies that capture enormous amounts of data – human genome sequencing, Internet transaction tracking, instruments that beam high-resolution images from outer space – have opened opportunities to predict discrete “high dimensional” or “high-D” unknowns. The huge number of combinations of these “high-D” unknowns produces enormous statistical uncertainty. Data has outgrown data analysis.

This discrepancy creates a paradox. Instead of producing more precise predictions about gene activity, shopping habits or the presence of faraway stars, these large data sets are producing more unreliable predictions, given current procedures. That’s because maximum likelihood estimators use data to identify the single most probable solution. But because any one data point swims in an increasingly immense sea, it’s not likely to be representative.

Lawrence, a professor of applied mathematics and a faculty member in the Center for Computational Molecular Biology at Brown, first came upon this paradox and a potential way around it while working on predicting the structure of RNA molecules. If you want to predict the structure of these molecules – how the molecule will look when it folds onto itself – you’d have billions and billions of possible shapes to choose from.

“Using maximum likelihood estimation, the most likely outcome would be very, very, very unlikely,” Lawrence said, “so we knew we needed a better estimation method.”

Lawrence and Carvahlo used statistical decision theory to understand the limitations of the old procedure when faced with new “high-D” problems. They also used statistical decision-making theory to find an estimation procedure that applies to a broad range of statistical problems. These “centroid” estimators identify not the single most probable solution, but the solution that is most representative of all the data in a set.

Lawrence and Carvahlo went on to prove four theorems that illustrate the favorable properties of these estimators and show that they can be easily computed in many important applications.

“This new procedure should benefit any field that needs to reliably make predictions of large-scale, high-D unknowns,” Lawrence said.

Source: Brown University

4.6 /5 (44 votes)  

Filter


Move the slider to adjust rank threshold, so that you can hide some of the comments.


Display comments: newest first

thomson2008
Nov 12, 2008

Rank: not rated yet
This discrepancy creates a paradox. Instead of producing more precise predictions about gene activity, shopping habits or the presence of faraway stars, these large data sets are producing more unreliable predictions, given current procedures. That%u2019s because maximum likelihood estimators use data to identify the single most probable solution. But because any one data point swims in an increasingly immense sea, it%u2019s not likely to be representative.
==============================
Thomson
homes for sale by owner
Rank 4.6 /5 (44 votes)
Tags

Relevant PhysicsForums posts
  • Finding intersections
    created11 hours ago
  • Interpreting a function based on it's equation.
    created13 hours ago
  • I found this. What is it?
    created16 hours ago
  • Derivative wrt a constant?
    created21 hours ago
  • Using Excel to figure out how much money I could make if I traded my dividends?
    created22 hours ago
  • Linear Equations (General and Standard forms: From Wikipedia)
    createdFeb 11, 2012
  • More from Physics Forums - General Math

More news stories

A frank discussion of the power law and linking correlation to causation

(PhysOrg.com) -- Michael Stumpf a mathematics professor at Imperial College in London, and Mason Porter a lecturer at Oxford have teamed together to write and publish a perspective piece in Science regarding the in ...

Other Sciences / Mathematics

created Feb 10, 2012 | popularity 5 / 5 (5) | comments 10 | with audio podcast report

Employers feel no love for unscrupulous practice of 'service sweethearting'

A new study led by two Florida State University marketing professors finds that some frontline service employees who are rewarded for hikes in customer loyalty and satisfaction also may engage in "service ...

Other Sciences / Economics & Business

created Feb 10, 2012 | popularity 3.3 / 5 (3) | comments 11

US workers are 'giving away the store,' costing firms billions

Nearly 70 percent of the nation's service employees give away free goods and services – from hamburgers to cable TV – costing companies billions of dollars a year, according to a groundbreaking study.

Other Sciences / Economics & Business

created Feb 09, 2012 | popularity 3.5 / 5 (4) | comments 10

New insights into how to correct false knowledge

The abundance of false information available on the Internet, in movies and on TV has created a big challenge for educators.

Other Sciences / Social Sciences

created Feb 07, 2012 | popularity 4.9 / 5 (7) | comments 9 | with audio podcast

Neanderthal demise due to many influences, including cultural changes: study

As an ice age crept upon them thousands of years ago, Neanderthals and modern human ancestors expanded their territory ranges across Asia and Europe to adapt to the changing environment.

Other Sciences / Archaeology & Fossils

created Feb 07, 2012 | popularity 4.4 / 5 (5) | comments 8 | with audio podcast


Google might launch Drive for cloud storage soon

(PhysOrg.com) -- Google's next big move, according to the Wall Street Journal, is a cloud storage service called Drive. Hardly first to the plate, Google is simply catching up to introducing its cloud reposi ...

Walney offshore wind farm is world's biggest (for now)

(PhysOrg.com) -- The Walney wind farm on the Irish Sea--characterized by high tides, waves and windy weather--officially opened this week. The farm is treated in the press as a very big deal as the Walney ...

Latin America mining boom clashes with conservation

Latin America is experiencing a mining boom as prices rise fuelled by a hike in global demand, but the region is also being hit by a wave of violent protests, strikes and rallies by environmentalists.

Love a click away in Indonesia's Twitter Republic

He was a geeky kid from Yogyakarta, she a glamorous city girl in Jakarta. In a country with one of the world's most vibrant social networking scenes they fell in love on Twitter.

Europeans protest controversial Internet pact

Tens of thousands of people marched in protests in more than a dozen European cities Saturday against a controversial anti-online piracy pact that critics say could curtail Internet freedom.

Navy to begin tests on electromagnetic railgun prototype launcher

The Office of Naval Research (ONR)'s Electromagnetic (EM) Railgun program will take an important step forward in the coming weeks when the first industry railgun prototype launcher is tested at a facility ...