Researchers mine millions of metaphors through computer-based techniques

March 3, 2009 By Lisa M. Krieger

Metaphors cannot be taught, asserted the great philosopher Aristotle. "It is the one thing that cannot be learnt from others." But a computer scientist and literary historian say he's wrong.

In a project started at Stanford University, the researchers are teaching computers how to analyze texts from Plato to Pynchon, mining millions of these abstract phrases. (Metaphorically speaking.)

They're building a vast searchable database, making it possible to browse historic patterns of word usage - for instance, "rose" and "love" - from ancient Homeric epics to postmodern cyberpunk novels, and everything in between.

"As a tool, it provides a really powerful way of thinking about a lot of literature at once," said English literature professor Brad Pasanek, who collaborated on the project with longtime friend and computer scientist D. Sculley.

The work makes tangible what the German linguist Harald Weinrich called the "metaphoric field." "Pasanek's database is the first 'metaphoric field' that we can actually see and use," said Franco Moretti, a Stanford comparative literature professor. "It provides empirical proof for a daring, but never wholly solid concept."

This approach to studying literature was inconceivable back around 330 BCE, when Aristotle wrote that "the greatest thing by far is to be a master of the metaphor," language that compares seemingly unrelated subjects - a "winged thought," for instance.

But two new trends have created a field of computer-based literary analysis, part of the emerging discipline called "digital humanities," an intersection of computing and the study of languages, history, philosophy and religion.

Digitized libraries have put an ocean of books - including obscure ones - at readers' fingertips. Using new data mining techniques and "machine learning," researchers can search the millions of words contained in those books to study subtle shifts in how words were used. Analyzing such patterns offers insights to how language - and culture - evolved.

The idea was conceived when Pasanek was idly flipping through his worn copy of "Pride and Prejudice," its key phrases highlighted in bright colors.

Through the tangled tale of Elizabeth, Darcy and Wickham, "marking words that occurred again and again, I realized that you could flip through a novel and see these motifs appear in an explosion of color, then disappear."

The computer replaces the colored marker, he said. "It's possible to trace when and where something appears, what it means, and how it changes," he said.

Pasanek's near-obsessive collection of interesting metaphors began while he was at Stanford working on his Ph.D. First he kept a list on the back pages of the works of Shakespeare, Milton and the King James Bible. As his list grew, he moved to index cards.

"Metaphors are a fundamental figure of speech," he said. "They show how we think, and how what we think changes over time."

Recognizing he needed help, Stanford computer scientist Matt Jockers helped him create a digital database, which was initially posted in 2005. The list quickly grew to 1,000, then 3,000 entries. But the list's expansion created a special search challenge.

"The nature of metaphor is such that it does not lend itself to easy detection by the usual sorts of pattern matching algorithms," Jockers said. Finding a simile is a fairly straightforward task: one writes a program that looks for text strings of the type "like" and "as."

"Structurally speaking, the phrase 'my love is a red rose' is very much the same as 'my dog is a blue heeler,'" Jockers said. "The former is metaphor, the latter is not."

Pasanek provided the computer with examples of metaphors and "trained" the machine to recognize them. They programmed "proximity searches" between words likely to be metaphoric. For example, a search for "mind" within 100 characters of "mint" finds the following couplet in William Cowper's poetry: "The mind and conduct mutually imprint /And stamp their image in each other's mint."

A similar technique, said Sculley, is used in spam-recognition software.

In one project, they tracked the evolving references to the young mind. In the fourth century B.C., it was referred to as a "tabula rasa," Latin for "blank slate." By the 17th century, John Locke called it a "white Paper, void of all Characters." In 18th century texts, it was compared to a "roasting jack," conjuring up an image of meat spinning on a rotisserie, cooked by flames. As tools changed -- slates, paper, rotisseries -- so did the references.

There are other metaphor databases, though Pasanek says his is the largest and geared toward the history of thought. However, the database (http://mind.textdriven.com) is still in its beta version, said Pasanek, who now teaches literature at the University of Virginia in Charlottesville. Under renovation, it suffers from what he calls "bug plagues." But with time, it will improve, and broaden to vast horizons.

"A metaphor has a career, and it tells a complete story," he said, "about how we think about ourselves and the world."

___

(c) 2009, San Jose Mercury News (San Jose, Calif.).
Visit MercuryNews.com, the World Wide Web site of the Mercury News, at http://www.mercurynews.com
Distributed by McClatchy-Tribune Information Services.


print this article email this article download pdf blog this article bookmark this article     Stumble it Digg this share on Facebook retweet share on Reddit add to delicious
Rate this story - 4.7 /5 (6 votes)


March 3, 2009 all stories

Comments: 0

4.7 /5 (6 votes)
  • Stumble this up

  • Digg this

  • share this

  • hide
  • Related Stories

  • The court will now call its expert witness: the brain
    created Nov 20, 2009 | popularity not rated yet | comments 0
  • Waking up memories while you sleep
    created Nov 19, 2009 | popularity not rated yet | comments 0
  • Inventing language
    created Nov 10, 2009 | popularity not rated yet | comments 0
  • Deepening the search for clues to rheumatoid arthritis
    created Nov 09, 2009 | popularity not rated yet | comments 0
  • Like humans, monkeys fall into the 'uncanny valley'
    created Oct 13, 2009 | popularity not rated yet | comments 0



  • hide
  • Relevant PhysicsForums posts

  • Help with a camera choice
    created Nov 18, 2009
  • casio calculator that's similar to TI-89
    created Nov 08, 2009
  • Advice on what cell phone to get
    created Nov 08, 2009
  • Changing the language options on your phone.
    created Nov 03, 2009
  • HP strange RPN operation???
    created Nov 02, 2009
  • Databases in physics
    created Oct 31, 2009
  • More from Physics Forums - Computing & Technology

Other News

China is the world's largest emitter of the greenhouse gases blamed for global warming

China harnesses mountain wind power

Technology / Energy

created 6 hours ago | popularity 5 / 5 (2) | comments 0

In the mountains above the southwestern Chinese town of Dali, dozens of new wind turbines dot the landscape -- a symbol of the country's sky-high ambitions for clean, green energy.


Hackers leak e-mails, stoke climate debate

Technology / Internet

created 18 hours ago | popularity 4.5 / 5 (22) | comments 18

(AP) -- Computer hackers have broken into a server at a well-respected climate change research center in Britain and posted hundreds of private e-mails and documents online - stoking debate over whether some scientists have ...


Analysts say AmEx is most interested in the so-called peer-to-peer services of Revolution

American Express takes aim at PayPal with Revolution

Technology / Internet

created 3 hours ago | popularity not rated yet | comments 0

With its deal to buy Revolution Money, American Express is taking aim at the growing market for online and alternative payments, in a challenge to recognized leader PayPal, analysts say.


Ubisoft steps up videogame fitness with virtual coach

Technology / Software

created 7 hours ago | popularity not rated yet | comments 0

French videogame powerhouse Ubisoft will have a virtual fitness coach whipping Wii users into shape starting Tuesday.


plug-in hybrid electric vehicle

Pulling the plug on hybrid myths

Technology / Energy

created Nov 19, 2009 | popularity 3.8 / 5 (12) | comments 18

(PhysOrg.com) -- Whether you call them myths, urban legends, fables or old wives' tales, there's a lot of misinformation out there about plug-in electric hybrid vehicles. These vehicles, abbreviated PHEVs, ...