Researchers discover methods to find 'needles in haystack' in data
December 5, 2005A Case Western Reserve University research team from physics and statistics has recently created innovative statistical techniques that improve the chances of detecting a signal in large data sets. The new techniques can not only search for the "needle in the haystack" in particle physics, but also have applications in discovering a new galaxy, monitoring transactions for fraud and security risk, identifying the carrier of a virulent disease among millions of people or detecting cancerous tissues in a mammogram.
Case faculty members Ramani Pilla and Catherine Loader from statistics and Cyrus Taylor from physics report their findings in the article, "A New Technique for Finding Needles in Haystacks: A Geometric Approach to Distinguishing between a New Source and Random Fluctuations," December 2, in the journal, Physical Review Letters.
"As haystacks of information grow ever larger--and the needles ever smaller--the search for a signal becomes increasingly difficult to find using traditional approaches. There is a need for sophisticated new statistical methods," the researchers report.
Researchers working with large amounts of data encounter the fundamental problem of determining a real signal from random variation in the data. In many practical problems, a suspected signal may only be a small blip in a noisy experimental background.
The Case team discovered a technique that is built on the principle of comparing a set of summary characteristics for any sub region of the observations with the background variation. From these characteristics, attempts are made to find small regions that appear significantly different from the background--a difference that cannot simply be attributed to random chance.
"Methods used in high-energy particle physics problems traditionally have searched for any departure from a background model; that is, anything that is not a haystack," said Pilla, the project leader. "Our method efficiently incorporates information about the type of disorder expected, thereby enabling us to find the signal of interest more accurately."
At the core of the breakthrough is the idea of posing the problem in terms of a "hypothesis-based testing" paradigm to detect statistical disorder in the data. The method further exploits the flexibility behind a long-established geometric formula in creating a technique that significantly enhances the ability to distinguish a signal.
The researchers said the challenge is two-fold: defining efficient test statistics, and determining the critical cut-off. That is, to help the scientist find what is random variation as opposed to what is the signal. The detection problem involves a large number of comparisons, and the researchers caution that experimentalists should not be fooled into false discoveries by random variation.
"The experimenter wants to control the experiment-wise error rate: if there is nothing in the data, then there must be minimal probability of falsely discovering a signal. On the other hand, we want to maximize our chance of discovering any real signal that may be present in the massive data set," said Loader.
"The probabilistic problem associated with this scenario is reduced to one of finding the areas of certain regions on the surface of high-dimensional spheres," explains Pilla.
The Case researchers then exploit the geometric methods pioneered in 1939 by Harold Hotelling and Hermann Weyl. They tested the statistical techniques by using computer simulated particle physics experiments that mimic the real experiments conducted in colliders to demonstrate that the new technique significantly increased detection probabilities.
"In high-energy particle physics and astrophysics problems, chi-square goodness-of-fit tests are widely employed, although they have relatively low power to detect the signal," notes Taylor. "Through my collaborative work with Professors Pilla and Loader, we will be able to develop powerful statistical tests for detecting a signal from noisy data with high probability, a fundamental problem encountered in many scientific disciplines."
Taylor added that "conducting experiments in a particle collider may cost tens of millions of dollars. Improving efficiency in the analysis of experimental results can lead to enormous cost savings. Furthermore, we can obtain the same results with much smaller experiments, or effectively find much smaller departures from the background model."
"Detecting a real signal (the needle) present in random and chaotic data (the haystack) will lead to scientific success," conclude the researchers.
Source: Case Western Reserve University
-
Molecular path from internal clock to cells controlling rest and activity revealed in new study
Feb 07, 2012 |
5 / 5 (1) |
0
-
Electrons in concert: A simple probe for collective motion in ultracold plasmas
Feb 06, 2012 |
4.2 / 5 (5) |
0
-
Researchers discover critical rotational motion in cells
Jan 26, 2012 |
5 / 5 (3) |
1
-
Flaky graphene makes reliable chemical sensors
Jan 17, 2012 |
not rated yet |
0
-
Parkinson's treatment shows positive results in clinical testing
Jan 11, 2012 |
5 / 5 (1) |
0
-
Engineers build first sub-10-nm carbon nanotube transistor
Feb 01, 2012 |
4.9 / 5 (31) |
30
-
Something old, something new: Evolution and the structural divergence of duplicate genes
Jan 31, 2012 |
4.6 / 5 (7) |
1
-
The hidden nanoworld of ice crystals: Revealing the dynamic behavior of quasi-liquid layers
Jan 30, 2012 |
5 / 5 (3) |
1
-
Stock market network reveals investor clustering
Jan 27, 2012 |
3.9 / 5 (23) |
8
-
Of microchemistry and molecules: Electronic microfluidic device synthesizes biocompatible probes
Jan 26, 2012 |
5 / 5 (1) |
0
More news stories
Putting the squeeze on planets outside our solar system
(PhysOrg.com) -- Using high-powered lasers, scientists at Lawrence Livermore National Laboratory and collaborators discovered that molten magnesium silicate undergoes a phase change in the liquid state, abruptly ...
22 hours ago |
4.3 / 5 (8) |
0
|
Hovering not hard if you're top-heavy, researchers find
Top-heavy structures are more likely to maintain their balance while hovering in the air than are those that bear a lower center of gravity, researchers at New York University's Courant Institute of Mathematical Sciences ...
23 hours ago |
5 / 5 (3) |
1
|
Explained: Sigma
It's a question that arises with virtually every major new finding in science or medicine: What makes a result reliable enough to be taken seriously? The answer has to do with statistical significance -- but ...
Feb 09, 2012 |
5 / 5 (17) |
57
Quantum physicist explains $100K offer for proof scaled-up quantum computing is impossible
(PhysOrg.com) -- MIT researcher Scott Aaronson has certainly riled the physics community with his offer this past Friday, of $100,000 to anyone who can prove that scaled-up quantum computing is impossible. ...
Diamond light, brighter than the sun
Its the size of five football pitches and generates light 10 billion times brighter than the sun. As the Diamond Light Source celebrates its tenth anniversary this year, Penny Bailey visits one of the ...
Feb 07, 2012 |
4.3 / 5 (7) |
15
|
Walney offshore wind farm is world's biggest (for now)
(PhysOrg.com) -- The Walney wind farm on the Irish Sea--characterized by high tides, waves and windy weather--officially opened this week. The farm is treated in the press as a very big deal as the Walney ...
GPS court ruling leaves US phone tracking unclear
A US Supreme Court decision requiring a warrant to place a GPS device on the car of a criminal suspect leaves unresolved the bigger issue of police tracking using mobile phones, legal experts say.
Anonymous briefly knocks CIA website offline (Update 2)
The website of the Central Intelligence Agency was briefly inaccessible on Friday after the hacker group Anonymous claimed to have knocked it offline.
Study finds that anti-diabetic medication can prevent the long-term effects of maternal obesity
In a study to be presented today at the Society for Maternal-Fetal Medicine's annual meeting, The Pregnancy Meeting, in Dallas, Texas, researchers will report findings that show that short therapy with the anti-diabetic medication ...
Europe stakes billion-dollar bet on new rocket
A pencil-slim rocket is scheduled to lift into space from South America on Monday, carrying a billion-dollar bet that Europe can grab a juicy slice of the market to place satellites in low orbit.
Netflix settlement trims 14 pct off 4Q earnings
(AP) -- Netflix pressed the rewind button on its fourth-quarter earnings after settling allegations that the video subscription service violated a consumer-privacy law.