Scientists devise means to test for phony technical papers

April 24th, 2006

Authors of bogus technical articles beware. A team of researchers at the Indiana University School of Informatics has designed a tool that distinguishes between real and fake papers. It's called the Inauthentic Paper Detector -- one of the first of its kind anywhere -- and it uses compression to determine whether technical texts are generated by man or machine.

"This is a potential problem since no existing systems, the Web for example, can or do discriminate between content that is meaningful or bogus," says assistant professor Mehmet Dalkilic, a data mining expert. "We believe that there are subtle, short- and long-range word or even word string repetitions that exist in human texts, but not in many classes of computer-generated texts that can be used to discriminate based on meaning."

Joining Dalkilic on the IPD project are Assistant Professor Predrag Radivojac, informatics doctoral student James Costello, and Wyatt T. Clark, who will graduate in May with a bachelor's degree in informatics.

The IPD system is based on a combination of compression algorithms that reduce the amount of data to save space and speed transmission time.

To begin their study, the team identified two kinds of texts they would analyze. "Authentic text" (or document) is a collection of several hundreds or thousands of syntactically correct sentences that are wholly meaningful. "Inauthentic text" (or document) is a collection of several hundreds of thousands of syntactically correct sentences that, taken all together, have no meaning.

The researchers' work is documented in the very authentic paper, "Using Compression to Identify Classes of Inauthentic Texts," which they presented at the Society for Industrial and Applied Mathematics Conference on Data Mining in Bethesda, Md., this weekend.

The informatics study largely was inspired by a prank pulled by three Massachusetts Institute of Technology students, who in 2004 developed a computer program that churned out randomly generated fake computer science language, essentially a four-page compilation of gibberish. They submitted it as a research paper to an international conference on computer science and informatics – and it was accepted without review.

Radivojac, whose research expertise is machine learning, says the IPD easily detected numerous inauthentic technical papers tested, including the MIT students' spurious submission.

"We hypothesized we could build a reliable and fast model that recognizes fake papers automatically," says Radivojac. "We combined these with machine-learning methods to build a predictor of these kinds of papers."

In general, identifying meaning in a technical document is difficult, Dalkilic says. "We don't claim we have found a way to distinguish between meaning and nonsense, but we do emphasize that there are many nontrivial classes of inauthentic documents that can be easily distinguished based on compression algorithms."

Source: Indiana University School of Informatics


print this article email this article download pdf blog this article bookmark this article     Digg this Stumble it share on Facebook share on Reddit add to delicious save to Yahoo! bookmarks
3.8/5 after 8 votes


April 24th, 2006 all stories
Technology /

Comments: 0
Rank: 3.8/5 after 8 votes

  • Stumble this up

  • Digg this

  • Share it:
  • share on Facebook
  • share on MySpace
  • share on Slashdot
  • rss-newsfeed
  • share on Google
  • share on Reddit
  • add to delicious
  • save to Yahoo! bookmarks
  • share on Windows Live
  • Add to Mixx!
Rating: 3.8/5 after 8 votes

  • Related Stories

  • Research team develops systems that process and understand spoken language, especially Basque
    created Mar 10, 2008 | popularity not rated yet | comments 0
  • Computers aid in cracking deception in plants
    created Jun 25, 2009 | popularity not rated yet | comments 0
  • ADHD genes found, known to play roles in neurodevelopment
    created Jun 23, 2009 | popularity not rated yet | comments 0
  • New technique opens door to early Alzheimer's diagnosis
    created Jun 16, 2009 | popularity not rated yet | comments 0
  • Researchers pioneer an advanced sepsis detection and management system
    created Jun 16, 2009 | popularity not rated yet | comments 0

Tags


  • Physicists Demonstrate Quantum Memory with Matter Qubits
    Physicists Demonstrate Quantum Memory with Matter Qubits
    Physics / General Physics
    created Jul 03, 2009 | popularity 4.4 / 5 (17) | comments 1
  • 'Holey' Nanosheets for Wastewater Dye Removal
    Nanotechnology / Nanomaterials
    created Jul 01, 2009 | popularity 5 / 5 (5) | comments 1
  • Jellyfish Robot Swims Like its Biological Counterpart
    Jellyfish Robot Swims Like its Biological Counterpart
    Electronics / Robotics
    created Jun 26, 2009 | popularity 4.4 / 5 (8) | comments 1
  • Could Maxwell's Demon Exist in Nanoscale Systems?
    Could Maxwell's Demon Exist in Nanoscale Systems?
    Physics / General Physics
    created Jun 24, 2009 | popularity 4.4 / 5 (18) | comments 29
  • Living Safely with Robots, Beyond Asimov's Laws
    Living Safely with Robots, Beyond Asimov's Laws
    Electronics / Robotics
    created Jun 22, 2009 | popularity 4.6 / 5 (54) | comments 40
  • Other News

    Pages of the Codex Sinaiticus are pictured on a laptop in Westminster Cathedral, central London

    World's oldest surviving Bible published online

    Technology / Internet

    created 16 minutes ago | popularity not rated yet | comments 0

    About 800 pages of the world's oldest surviving Bible have been pieced together and published on the Internet for the first time, experts in Britain said Monday.


    Translate this: 'cognition-strength interfaces'

    Translate this: 'cognition-strength interfaces'

    Technology / Engineering

    created 3 hours ago | popularity 5 / 5 (1) | comments 0

    (PhysOrg.com) -- A highly ambitious European project used basic cognitive function, eye-tracking and keystroke logging as the starting point for the study of human-computer interaction for translation. It ...


    EMC raises offer for Data Domain

    Technology / Business

    created 1hour ago | popularity not rated yet | comments 0

    Computer storage giant EMC raised its offer to purchase data storage firm Data Domain on Monday in a bid to top a rival offer for the company by data management firm NetApp.


    HTC Touch

    Taiwan's HTC earnings edge down in Q2

    Technology / Business

    created 6 hours ago | popularity not rated yet | comments 0

    HTC Corp, Taiwan's leading smartphone maker, said Monday its net profit in the second quarter was down almost two percent from a year earlier.


    Samsung announces earnings estimate (AP)

    Samsung announces earnings estimate

    Technology / Business

    created 7 hours ago | popularity not rated yet | comments 0

    (AP) -- Samsung Electronics Co., the world's biggest manufacturer of memory chips, announced quarterly earnings estimates for the first time Monday, saying it hopes to reduce market confusion and speculation ...