Grammar Lost Translation Machine In Researchers Fix Will
September 9, 2005
The makers of a University of Southern California computer translation system consistently rated among the world's best are teaching their software something new: English grammar.
Image: A Tree Grows in Translation Grammatical structure, long in second place, is emerging as a key to better English in the finished product.
Most modern "machine translation" systems, including the highly rated one created by USC's Information Sciences Institute, rely on brute force correlation of vast bodies of pre-translated text from such sources as newspapers that publish in multiple languages.
Software matches up phrases that consistently show up in parallel fashion — the English "my brother's pants" and Spanish "los pantalones de mi hermano," — and then use these matches to piece together translations of new material.
It works — but only to a point. ISI machine translation expert Daniel Marcu (left) says that when such a system is "trained on enough relevant bilingual text ... it can break a foreign language up into phrasal units, translate each of them fairly well into English, and do some re-ordering. However, even in this good scenario, the output is still clearly not English. It takes too long to read, and it is unsatisfactory for commercial use."
So Marcu and colleague Kevin Knight (right), both ISI project leaders who also hold appointments in the USC Viterbi School of Engineering department of computer science, have begun an intensive $285,000 effort, called the Advanced Language Modeling for Machine Translation project, to improve the system they created at ISI by subjecting the texts that come out of their translation engine to a follow-on step: grammatical processing.
The step seems simple, but is actually imposingly difficult. "For example, there is no robust algorithm that returns 'grammatical' or 'ungrammatical' or 'sensible' or 'nonsense' in response to a user-typed sequence of words," Marcu notes.
The problem grows out of a natural language feature noted by M.I.T. language theorist Noam Chomsky decades ago. Language users have literally a limitless ability to nest and cross-nest phrases and ideas into intricate referential structures — "I was looking for the stirrups from the saddle that my ex-wife's oldest daughter took with her when she went to Jack's new place in Colorado three years ago, but all she had were Louise's second-hand saddle shoes, the ones Ethel's dog chewed during the fire."
Unraveling these verbal cobwebs (or, in the more common description, tracing branching "trees" of connections) is such a daunting task that programmers long ago went in the brute force direction of matching phrases and hoping that the relation of the phrases would become clear to readers.
With the limits of this approach becoming clear, researchers have now begun applying computing power to trying to assemble grammatical rules. According to Knight, one crucial step has been the creation of a large database of English text whose syntax has been hand-decoded by humans, the "Penn Treebank."
Using this and other sources, computer scientists have begun developing ways to model the observed rules. A preliminary study by Knight and two colleagues in 2003 showed that this approach might be able to improve translations.
Accordingly, for their study, "We propose to implement a trainable tree-based language model and parser, and to carry out empirical machine-translation experiments with them. USC/ISI's state-of-the-art machine translation system already has the ability to produce, for any input sentence, a list of 25,000 candidate English outputs. This list can be manipulated in a post-processing step. We will re-rank these lists of candidate string translations with our tree- based language model, and we plan for better translations to rise to the top of the list."
One crucial trick that the system must be able to do is to pick out separate trees from the endless strings of words. But this is doable, Knight believes -- and in the short, not the long term.
Referring to the annual review of translation systems by the National Institute of Science and Technology, in which ISI consistently gains top scores, "we want to have the grammar module installed and working by the next evaluation, in August 2006," he said.
Knight and Marcu are cofounders and, respectively, chief scientist and chief technology and operating officer of a spinoff company, Language Weaver.
Source: University of Southern California
-
Facebook shows relentless global growth
Jan 18, 2012 |
not rated yet |
1
-
Malaysia eye-poked over bad Google translations
Jan 10, 2012 |
5 / 5 (1) |
0
-
For children, some sarcastic comments can be lost in translation
Dec 08, 2011 |
4 / 5 (1) |
0
-
Jet packs rule, say deep-sea astronauts
Oct 25, 2011 |
4.2 / 5 (5) |
0
-
Computer scientist cracks mysterious 'Copiale Cipher'
Oct 25, 2011 |
4.8 / 5 (26) |
21
-
Engineers build first sub-10-nm carbon nanotube transistor
Feb 01, 2012 |
4.9 / 5 (31) |
30
-
Something old, something new: Evolution and the structural divergence of duplicate genes
Jan 31, 2012 |
4.6 / 5 (7) |
1
-
The hidden nanoworld of ice crystals: Revealing the dynamic behavior of quasi-liquid layers
Jan 30, 2012 |
5 / 5 (3) |
1
-
Stock market network reveals investor clustering
Jan 27, 2012 |
3.9 / 5 (23) |
8
-
Of microchemistry and molecules: Electronic microfluidic device synthesizes biocompatible probes
Jan 26, 2012 |
5 / 5 (1) |
0
More news stories
Walney offshore wind farm is world's biggest (for now)
(PhysOrg.com) -- The Walney wind farm on the Irish Sea--characterized by high tides, waves and windy weather--officially opened this week. The farm is treated in the press as a very big deal as the Walney ...
GPS court ruling leaves US phone tracking unclear
A US Supreme Court decision requiring a warrant to place a GPS device on the car of a criminal suspect leaves unresolved the bigger issue of police tracking using mobile phones, legal experts say.
6 hours ago |
4 / 5 (1) |
0
Europeans protest controversial Internet pact
Tens of thousands of people marched in protests in more than a dozen European cities Saturday against a controversial anti-online piracy pact that critics say could curtail Internet freedom.
2 hours ago |
5 / 5 (1) |
0
Anonymous briefly knocks CIA website offline (Update 2)
The website of the Central Intelligence Agency was briefly inaccessible on Friday after the hacker group Anonymous claimed to have knocked it offline.
23 hours ago |
4.7 / 5 (15) |
24
Netflix settlement trims 14 pct off 4Q earnings
(AP) -- Netflix pressed the rewind button on its fourth-quarter earnings after settling allegations that the video subscription service violated a consumer-privacy law.
6 hours ago |
not rated yet |
0
Study finds that anti-diabetic medication can prevent the long-term effects of maternal obesity
In a study to be presented today at the Society for Maternal-Fetal Medicine's annual meeting, The Pregnancy Meeting, in Dallas, Texas, researchers will report findings that show that short therapy with the anti-diabetic medication ...
Europe stakes billion-dollar bet on new rocket
A pencil-slim rocket is scheduled to lift into space from South America on Monday, carrying a billion-dollar bet that Europe can grab a juicy slice of the market to place satellites in low orbit.
Steroid injections prove effective in treatment of lumbar disc herniations
The use of epidural steroid injections may be a more efficient treatment option for lumbar disc herniations, according to research presented today at the American Orthopaedic Society for Sports Medicine's Specialty Day in ...
Amateur football players not always keen on returning to play after ACL injuries
Despite the known success rates of reconstructive Anterior Cruciate Ligament (ACL) surgery, the number of high school and collegiate football players returning to play may not be as high as anticipated, say researchers presenting ...
Study finds elevated levels of cell-free DNA in first trimester do not predict preeclampsia
In a study to be presented today at the Society for Maternal-Fetal Medicine's annual meeting, The Pregnancy Meeting, in Dallas, Texas, researchers will report findings that indicate that elevated levels of cell-free DNA in ...
PRP treatment aids healing of elbow injuries say researchers
As elbow injuries continue to rise, especially in pitchers, procedures to help treat and get players back in the game quickly have been difficult to come by. However, a newer treatment called platelet rich plasma (PRP) may ...