Scientists devise means to test for phony technical papers
April 24, 2006Authors of bogus technical articles beware. A team of researchers at the Indiana University School of Informatics has designed a tool that distinguishes between real and fake papers. It's called the Inauthentic Paper Detector -- one of the first of its kind anywhere -- and it uses compression to determine whether technical texts are generated by man or machine.
"This is a potential problem since no existing systems, the Web for example, can or do discriminate between content that is meaningful or bogus," says assistant professor Mehmet Dalkilic, a data mining expert. "We believe that there are subtle, short- and long-range word or even word string repetitions that exist in human texts, but not in many classes of computer-generated texts that can be used to discriminate based on meaning."
Joining Dalkilic on the IPD project are Assistant Professor Predrag Radivojac, informatics doctoral student James Costello, and Wyatt T. Clark, who will graduate in May with a bachelor's degree in informatics.
The IPD system is based on a combination of compression algorithms that reduce the amount of data to save space and speed transmission time.
To begin their study, the team identified two kinds of texts they would analyze. "Authentic text" (or document) is a collection of several hundreds or thousands of syntactically correct sentences that are wholly meaningful. "Inauthentic text" (or document) is a collection of several hundreds of thousands of syntactically correct sentences that, taken all together, have no meaning.
The researchers' work is documented in the very authentic paper, "Using Compression to Identify Classes of Inauthentic Texts," which they presented at the Society for Industrial and Applied Mathematics Conference on Data Mining in Bethesda, Md., this weekend.
The informatics study largely was inspired by a prank pulled by three Massachusetts Institute of Technology students, who in 2004 developed a computer program that churned out randomly generated fake computer science language, essentially a four-page compilation of gibberish. They submitted it as a research paper to an international conference on computer science and informatics – and it was accepted without review.
Radivojac, whose research expertise is machine learning, says the IPD easily detected numerous inauthentic technical papers tested, including the MIT students' spurious submission.
"We hypothesized we could build a reliable and fast model that recognizes fake papers automatically," says Radivojac. "We combined these with machine-learning methods to build a predictor of these kinds of papers."
In general, identifying meaning in a technical document is difficult, Dalkilic says. "We don't claim we have found a way to distinguish between meaning and nonsense, but we do emphasize that there are many nontrivial classes of inauthentic documents that can be easily distinguished based on compression algorithms."
Source: Indiana University School of Informatics
-
Engineers build first sub-10-nm carbon nanotube transistor
Feb 01, 2012 |
4.9 / 5 (31) |
30
-
Something old, something new: Evolution and the structural divergence of duplicate genes
Jan 31, 2012 |
4.6 / 5 (7) |
1
-
The hidden nanoworld of ice crystals: Revealing the dynamic behavior of quasi-liquid layers
Jan 30, 2012 |
5 / 5 (3) |
1
-
Stock market network reveals investor clustering
Jan 27, 2012 |
3.9 / 5 (23) |
8
-
Of microchemistry and molecules: Electronic microfluidic device synthesizes biocompatible probes
Jan 26, 2012 |
5 / 5 (1) |
0
More news stories
Google users warned of threat to smartphone wallets
Users of Google smartphone wallets were being warned on Friday that there is a way to crack pass codes intended to thwart thieves from going on illicit shopping sprees.
6 hours ago |
5 / 5 (2) |
0
Anonymous knocks CIA website offline (Update)
The website of the Central Intelligence Agency was inaccessible on Friday after the hacker group Anonymous claimed to have knocked it offline.
8 hours ago |
5 / 5 (8) |
13
New error-correcting codes guarantee the fastest possible rate of data transmission
Error-correcting codes are one of the triumphs of the digital age. Theyre a way of encoding information so that it can be transmitted across a communication channel such as an optical fiber o ...
Technology / Computer Sciences
16 hours ago |
4.9 / 5 (8) |
6
|
New power source discovered
(PhysOrg.com) -- Researchers at the Massachusetts Institute of Technology (MIT) and RMIT University have made a breakthrough in energy storage and power generation.
Technology / Energy & Green Tech
15 hours ago |
4.8 / 5 (25) |
8
|
Small modular reactor design could be a 'SUPERSTAR'
(PhysOrg.com) -- Though most of today's nuclear reactors are cooled by water, we've long known that there are alternatives; in fact, the world's first nuclear-powered electricity in 1951 came from a reactor ...
Technology / Energy & Green Tech
16 hours ago |
4.3 / 5 (12) |
22
|
Humans may have helped the decline of African rainforests 3000 years ago
(PhysOrg.com) -- Large areas of rainforests in Central Africa mysteriously disappeared over three thousand years ago, to be replaced by savannas. The prevailing theory has been that the cause was a change ...
Complex wiring of the nervous system may rely on a just a handful of genes and proteins
Researchers at the Salk Institute have discovered a startling feature of early brain development that helps to explain how complex neuron wiring patterns are programmed using just a handful of critical genes. ...
The power of estrogen -- male snakes attract other males
A new study has shown that boosting the estrogen levels of male garter snakes causes them to secrete the same pheromones that females use to attract suitors, and turned the males into just about the sexiest ...
Putting the squeeze on planets outside our solar system
(PhysOrg.com) -- Using high-powered lasers, scientists at Lawrence Livermore National Laboratory and collaborators discovered that molten magnesium silicate undergoes a phase change in the liquid state, abruptly ...
Could Venus be shifting gear?
(PhysOrg.com) -- ESAs Venus Express spacecraft has discovered that our cloud-covered neighbour spins a little slower than previously measured. Peering through the dense atmosphere in the infrared, the ...
Fool's gold may prove an unlikely alternative to overexploited catalytic materials
Catalytic materials, which lower the energy barriers for chemical reactions, are used in everything from the commercial production of chemicals to catalytic converters in car engines. However, with current catalytic materials ...