Researchers discover methods to find 'needles in haystack' in data

December 5th, 2005

A Case Western Reserve University research team from physics and statistics has recently created innovative statistical techniques that improve the chances of detecting a signal in large data sets. The new techniques can not only search for the "needle in the haystack" in particle physics, but also have applications in discovering a new galaxy, monitoring transactions for fraud and security risk, identifying the carrier of a virulent disease among millions of people or detecting cancerous tissues in a mammogram.

Case faculty members Ramani Pilla and Catherine Loader from statistics and Cyrus Taylor from physics report their findings in the article, "A New Technique for Finding Needles in Haystacks: A Geometric Approach to Distinguishing between a New Source and Random Fluctuations," December 2, in the journal, Physical Review Letters.

"As haystacks of information grow ever larger--and the needles ever smaller--the search for a signal becomes increasingly difficult to find using traditional approaches. There is a need for sophisticated new statistical methods," the researchers report.

Researchers working with large amounts of data encounter the fundamental problem of determining a real signal from random variation in the data. In many practical problems, a suspected signal may only be a small blip in a noisy experimental background.

The Case team discovered a technique that is built on the principle of comparing a set of summary characteristics for any sub region of the observations with the background variation. From these characteristics, attempts are made to find small regions that appear significantly different from the background--a difference that cannot simply be attributed to random chance.

"Methods used in high-energy particle physics problems traditionally have searched for any departure from a background model; that is, anything that is not a haystack," said Pilla, the project leader. "Our method efficiently incorporates information about the type of disorder expected, thereby enabling us to find the signal of interest more accurately."

At the core of the breakthrough is the idea of posing the problem in terms of a "hypothesis-based testing" paradigm to detect statistical disorder in the data. The method further exploits the flexibility behind a long-established geometric formula in creating a technique that significantly enhances the ability to distinguish a signal.

The researchers said the challenge is two-fold: defining efficient test statistics, and determining the critical cut-off. That is, to help the scientist find what is random variation as opposed to what is the signal. The detection problem involves a large number of comparisons, and the researchers caution that experimentalists should not be fooled into false discoveries by random variation.

"The experimenter wants to control the experiment-wise error rate: if there is nothing in the data, then there must be minimal probability of falsely discovering a signal. On the other hand, we want to maximize our chance of discovering any real signal that may be present in the massive data set," said Loader.

"The probabilistic problem associated with this scenario is reduced to one of finding the areas of certain regions on the surface of high-dimensional spheres," explains Pilla.

The Case researchers then exploit the geometric methods pioneered in 1939 by Harold Hotelling and Hermann Weyl. They tested the statistical techniques by using computer simulated particle physics experiments that mimic the real experiments conducted in colliders to demonstrate that the new technique significantly increased detection probabilities.

"In high-energy particle physics and astrophysics problems, chi-square goodness-of-fit tests are widely employed, although they have relatively low power to detect the signal," notes Taylor. "Through my collaborative work with Professors Pilla and Loader, we will be able to develop powerful statistical tests for detecting a signal from noisy data with high probability, a fundamental problem encountered in many scientific disciplines."

Taylor added that "conducting experiments in a particle collider may cost tens of millions of dollars. Improving efficiency in the analysis of experimental results can lead to enormous cost savings. Furthermore, we can obtain the same results with much smaller experiments, or effectively find much smaller departures from the background model."

"Detecting a real signal (the needle) present in random and chaotic data (the haystack) will lead to scientific success," conclude the researchers.

Source: Case Western Reserve University


print this article email this article download pdf blog this article bookmark this article     Digg this Stumble it share on Facebook share on Reddit add to delicious save to Yahoo! bookmarks
4.3/5 after 4 votes


December 5th, 2005 all stories
Physics /

Comments: 0
Rank: 4.3/5 after 4 votes

  • Stumble this up

  • Digg this

  • Share it:
  • share on Facebook
  • share on MySpace
  • share on Slashdot
  • rss-newsfeed
  • share on Google
  • share on Reddit
  • add to delicious
  • save to Yahoo! bookmarks
  • share on Windows Live
  • Add to Mixx!
Rating: 4.3/5 after 4 votes


Tags


  • Physicists Demonstrate Quantum Memory with Matter Qubits
    Physicists Demonstrate Quantum Memory with Matter Qubits
    Physics / General Physics
    created Jul 03, 2009 | popularity 4.4 / 5 (17) | comments 1
  • 'Holey' Nanosheets for Wastewater Dye Removal
    Nanotechnology / Nanomaterials
    created Jul 01, 2009 | popularity 5 / 5 (5) | comments 1
  • Jellyfish Robot Swims Like its Biological Counterpart
    Jellyfish Robot Swims Like its Biological Counterpart
    Electronics / Robotics
    created Jun 26, 2009 | popularity 4.4 / 5 (8) | comments 1
  • Could Maxwell's Demon Exist in Nanoscale Systems?
    Could Maxwell's Demon Exist in Nanoscale Systems?
    Physics / General Physics
    created Jun 24, 2009 | popularity 4.4 / 5 (18) | comments 29
  • Living Safely with Robots, Beyond Asimov's Laws
    Living Safely with Robots, Beyond Asimov's Laws
    Electronics / Robotics
    created Jun 22, 2009 | popularity 4.6 / 5 (52) | comments 40
  • Other News

    Science journals

    How to Spot an Influential Paper Based on its Citations

    Physics / General Physics

    created 20 hours ago | popularity 4 / 5 (9) | comments 5

    (PhysOrg.com) -- At first it may seem that the number of citations received by a published scientific paper is directly related to that paper's quality of content. The higher the quality, the more people read ...


    Scientists create first electronic quantum processor

    Scientists create first electronic quantum processor

    Physics / General Physics

    created Jun 28, 2009 | popularity 4.8 / 5 (52) | comments 39

    A team led by Yale University researchers has created the first rudimentary solid-state quantum processor, taking another step toward the ultimate dream of building a quantum computer.


    Fermilab's CDF observes Omega-sub-b baryon

    Fermilab's CDF observes Omega-sub-b baryon

    Physics / General Physics

    created Jun 29, 2009 | popularity 4.7 / 5 (16) | comments 7

    (PhysOrg.com) -- At a recent physics seminar at the Department of Energy’s Fermi National Accelerator Laboratory, Fermilab physicist Pat Lukens of the CDF experiment announced the observation of a new particle, ...


    New insights, and a new angle, on high-temperature superconductivity

    New insights, and a new angle, on high-temperature superconductivity

    Physics / Superconductivity

    created Jun 29, 2009 | popularity 4.8 / 5 (13) | comments 6

    (PhysOrg.com) -- A Princeton-led research team has revealed surprising information about how electron behavior influences the conduction of electricity in a class of high-temperature superconductors. An increased ...


    The art of invisibility and the perfect cat's eye

    The art of invisibility and the perfect cat's eye

    Physics / Optics & Photonics

    created Jun 30, 2009 | popularity 4 / 5 (8) | comments 6

    (PhysOrg.com) -- In recent years scientists have explored the impossible by developing invisibility or 'cloaking' devices, but can the same technology also help make things more visible?