25 years of conventional evaluation of data analysis proves worthless in practice

September 3rd, 2008

So-called 'intelligent' computer-based methods for classifying patient samples, for example, have been evaluated with the help of two methods that have completely dominated research for 25 years. Now Swedish researchers at Uppsala University are revealing that this methodology is worthless when it comes to practical problems. The article is published in the journal Pattern Recognition Letters.

Today there is rapidly growing interest in 'intelligent' computer-based methods that use various classes of measurement signals, from different patient samples, for instance, to create a model for classifying new observations. This type of method is the basis for many technical applications, such as recognition of human speech, images, and fingerprints, and is now also beginning to attract new fields such as health care.

"Especially in applications in which faulty classification decisions can lead to catastrophic consequences, such as choosing the wrong form of therapy for treating cancer, it is extremely important to be able to make a reliable estimate of the performance of the classification model," explains Mats Gustafsson, Professor of signal processing and medical bioinformatics at Uppsala University, who co-directed the new study together with Associate Professor Anders Isaksson.

To evaluate the performance of a classification model, one normally tests it on a number of trial examples that have never been involved in the design of the model. Unfortunately there are seldom tens of thousands of test examples available for this type of evaluation. In biomedicine, for instance, it is often expensive and difficult to collect the patient samples needed, especially if one wishes to analyze a rare disease. To solve this problem, many different methods have been proposed. Since the 1980s two methods have completely dominated research, namely, cross validation and resampling/bootstrapping.

"This has entailed that the performance assessment of virtually all new methods and applications reported in the scientific literature in the last 25 years has been carried out using one of these two methods," says Mats Gustafsson.

In the new study, the Uppsala researchers use both theory and convincing computer simulations to show that this methodology is worthless in practice when the total number of examples is small in relation to the natural variation that exists among different observations. What is considered a small number depends in turn on what problem is being studied-­in other words, it is impossible to determine whether the number of examples is sufficient.

"Our main conclusion is that this methodology cannot be depended on at all, and that it therefore needs to be immediately replaces by Bayesian methods, for example, which can deliver reliable measures of the uncertainty that exists. Only then will multivariate analyses be in any position to be adopted in such critical applications as health care," says Mats Gustafsson.

Source: Uppsala University


print this article email this article download pdf blog this article bookmark this article     Digg this Stumble it share on Facebook share on Reddit add to delicious save to Yahoo! bookmarks
4.3/5 after 6 votes


September 3rd, 2008 all stories
Technology / Computer Sciences

Comments: 0
Rank: 4.3/5 after 6 votes

  • Stumble this up

  • Digg this

  • Share it:
  • share on Facebook
  • share on MySpace
  • share on Slashdot
  • rss-newsfeed
  • share on Google
  • share on Reddit
  • add to delicious
  • save to Yahoo! bookmarks
  • share on Windows Live
  • Add to Mixx!
Rating: 4.3/5 after 6 votes

  • Related Stories

  • Virus filters for medical diagnosis
    created Jun 25, 2009 | popularity not rated yet | comments 0
  • Intelligent DJ Emerges from Fundamental Research
    created Jun 22, 2009 | popularity not rated yet | comments 0
  • Non-wovens as scaffolds for artificial tissue
    created May 04, 2009 | popularity not rated yet | comments 0
  • Crop models help increase yield per unit of water used
    created May 04, 2009 | popularity not rated yet | comments 0
  • High-tech speed bump detects damage to army vehicles
    created Apr 13, 2009 | popularity not rated yet | comments 0


  • Physicists Demonstrate Quantum Memory with Matter Qubits
    Physicists Demonstrate Quantum Memory with Matter Qubits
    Physics / General Physics
    created Jul 03, 2009 | popularity 4.4 / 5 (17) | comments 1
  • 'Holey' Nanosheets for Wastewater Dye Removal
    Nanotechnology / Nanomaterials
    created Jul 01, 2009 | popularity 5 / 5 (5) | comments 1
  • Jellyfish Robot Swims Like its Biological Counterpart
    Jellyfish Robot Swims Like its Biological Counterpart
    Electronics / Robotics
    created Jun 26, 2009 | popularity 4.4 / 5 (8) | comments 1
  • Could Maxwell's Demon Exist in Nanoscale Systems?
    Could Maxwell's Demon Exist in Nanoscale Systems?
    Physics / General Physics
    created Jun 24, 2009 | popularity 4.4 / 5 (18) | comments 29
  • Living Safely with Robots, Beyond Asimov's Laws
    Living Safely with Robots, Beyond Asimov's Laws
    Electronics / Robotics
    created Jun 22, 2009 | popularity 4.6 / 5 (54) | comments 40
  • Other News

    Pages of the Codex Sinaiticus are pictured on a laptop in Westminster Cathedral, central London

    World's oldest surviving Bible published online

    Technology / Internet

    created 55 minutes ago | popularity 5 / 5 (1) | comments 0

    About 800 pages of the world's oldest surviving Bible have been pieced together and published on the Internet for the first time, experts in Britain said Monday.


    Translate this: 'cognition-strength interfaces'

    Translate this: 'cognition-strength interfaces'

    Technology / Engineering

    created 4 hours ago | popularity 5 / 5 (1) | comments 0

    (PhysOrg.com) -- A highly ambitious European project used basic cognitive function, eye-tracking and keystroke logging as the starting point for the study of human-computer interaction for translation. It ...


    EMC raises offer for Data Domain

    Technology / Business

    created 2 hours ago | popularity not rated yet | comments 0

    Computer storage giant EMC raised its offer to purchase data storage firm Data Domain on Monday in a bid to top a rival offer for the company by data management firm NetApp.


    HTC Touch

    Taiwan's HTC earnings edge down in Q2

    Technology / Business

    created 7 hours ago | popularity not rated yet | comments 0

    HTC Corp, Taiwan's leading smartphone maker, said Monday its net profit in the second quarter was down almost two percent from a year earlier.


    Samsung announces earnings estimate (AP)

    Samsung announces earnings estimate

    Technology / Business

    created 7 hours ago | popularity not rated yet | comments 0

    (AP) -- Samsung Electronics Co., the world's biggest manufacturer of memory chips, announced quarterly earnings estimates for the first time Monday, saying it hopes to reduce market confusion and speculation ...