Brown mathematicians prove new way to build a better estimate

February 29, 2008

How do you sift through hundreds of billions of bits of information and make accurate inferences from such gargantuan sets of data? Brown University mathematician Charles “Chip” Lawrence and graduate student Luis Carvalho have arrived at a fresh answer with broad applications in science, technology and business.

In new work published in the Proceedings of the National Academy of Sciences, Lawrence and Carvalho describe a new class of statistical estimators and prove four theorems concerning their properties. Their work shows that these “centroid” estimators allow for better statistical predictions – and, as a result, better ways to extract information from the immense data sets used in computational biology, information technology, banking and finance, medicine and engineering.

“What’s exciting about this work – what makes it every scientist’s dream – is that it’s so fundamental,” Lawrence said. “These new estimators have applications in biology and beyond and they advance a statistical method that’s been around for decades.”

For more than 80 years, one of the most common methods of statistical prediction has been maximum likelihood estimation (MLE). This method is used to find the single most probable solution, or estimate, from a set of data.

But new technologies that capture enormous amounts of data – human genome sequencing, Internet transaction tracking, instruments that beam high-resolution images from outer space – have opened opportunities to predict discrete “high dimensional” or “high-D” unknowns. The huge number of combinations of these “high-D” unknowns produces enormous statistical uncertainty. Data has outgrown data analysis.

This discrepancy creates a paradox. Instead of producing more precise predictions about gene activity, shopping habits or the presence of faraway stars, these large data sets are producing more unreliable predictions, given current procedures. That’s because maximum likelihood estimators use data to identify the single most probable solution. But because any one data point swims in an increasingly immense sea, it’s not likely to be representative.

Lawrence, a professor of applied mathematics and a faculty member in the Center for Computational Molecular Biology at Brown, first came upon this paradox and a potential way around it while working on predicting the structure of RNA molecules. If you want to predict the structure of these molecules – how the molecule will look when it folds onto itself – you’d have billions and billions of possible shapes to choose from.

“Using maximum likelihood estimation, the most likely outcome would be very, very, very unlikely,” Lawrence said, “so we knew we needed a better estimation method.”

Lawrence and Carvahlo used statistical decision theory to understand the limitations of the old procedure when faced with new “high-D” problems. They also used statistical decision-making theory to find an estimation procedure that applies to a broad range of statistical problems. These “centroid” estimators identify not the single most probable solution, but the solution that is most representative of all the data in a set.

Lawrence and Carvahlo went on to prove four theorems that illustrate the favorable properties of these estimators and show that they can be easily computed in many important applications.

“This new procedure should benefit any field that needs to reliably make predictions of large-scale, high-D unknowns,” Lawrence said.

Source: Brown University


print this article email this article download pdf blog this article bookmark this article     Stumble it Digg this share on Facebook retweet share on Reddit add to delicious
Rate this story - 4.6 /5 (44 votes)

Rank Filter

Move the slider to adjust rank threshold, so that you can hide some of the comments.


Display comments: newest first

  • thomson2008 - Nov 12, 2008
    • Rank: not rated yet
    This discrepancy creates a paradox. Instead of producing more precise predictions about gene activity, shopping habits or the presence of faraway stars, these large data sets are producing more unreliable predictions, given current procedures. That%u2019s because maximum likelihood estimators use data to identify the single most probable solution. But because any one data point swims in an increasingly immense sea, it%u2019s not likely to be representative.
    ==============================
    Thomson
    homes for sale by owner

February 29, 2008 all stories

Comments: 1

4.6 /5 (44 votes)
  • Stumble this up

  • Digg this

  • share this

  • hide
  • Related Stories




  • hide
  • Relevant PhysicsForums posts

Other News

Climate change could boost incidence of civil war in Africa

Other Sciences / Social Sciences

created Nov 23, 2009 | popularity 2.4 / 5 (17) | comments 11

Climate change could increase the likelihood of civil war in sub-Saharan Africa by over 50 percent within the next two decades, according to a new study led by a team of researchers at University of California, Berkeley, ...


Explained: The Discrete Fourier Transform

Explained: The Discrete Fourier Transform

Other Sciences / Mathematics

created Nov 25, 2009 | popularity 4.2 / 5 (34) | comments 8

(PhysOrg.com) -- In 1811, Joseph Fourier, the 43-year-old prefect of the French district of Isčre, entered a competition in heat research sponsored by the French Academy of Sciences. The paper he submitted ...


Political views may skew perception of skin tone, new study finds

Other Sciences / Social Sciences

created Nov 24, 2009 | popularity 3.3 / 5 (7) | comments 7

(PhysOrg.com) -- Political affinity could influence how some people view the skin tone of biracial political candidates, according to a new study from the University of Chicago Booth School of Business, New York University ...


Ancient Greek Temple

Houses of the rising sun: Research sheds new light on Ancient Greeks

Other Sciences / Archaeology & Fossils

created Nov 25, 2009 | popularity 4.3 / 5 (3) | comments 3

New research at the University of Leicester has identified scores of Sicilian temples built to face the rising Sun, shedding light on the practices of the Ancient Greeks.


Do kids benefit from homework?

Do kids benefit from homework?

Other Sciences / Social Sciences

created Nov 23, 2009 | popularity 4.5 / 5 (2) | comments 3

(PhysOrg.com) -- Homework is as old as school itself. Yet the practice is controversial as people debate the benefits or consider the shortcomings and hassles. Research into the topic is often contradictory ...