March 5, 2010

Context is ev ... well, something, anyway

by Larry Hardesty, Massachusetts Institute of Technology

Today, computers can't reliably identify the objects in digital images. But if they could, they could comb through hours of video for the two or three minutes that a viewer might be interested in, or perform web searches where the search term was an image, not a sequence of words. And of course, object recognition is a prerequisite for the kind of home assistance robot that could execute an order like "Bring me the stapler." Now, MIT researchers have found a way to improve object recognition systems by using information about context. If the MIT system thinks it’s identified a chair, for instance, it becomes more confident that the rectangular thing nearby is a table.

A typical object recognition system will scan a digital image for groups of pixels that differ from those around them; those pixels could define an edge, a corner or some other feature of an object. Usually, the system has been trained on a set of sample images, which teaches it how to correlate feature patterns with particular objects.

Some researchers have tried to use context information to refine those correlations. But according to Myung “Jin” Choi, a grad student in MIT’s Laboratory for Information and Decision Systems and one of the leaders of the new project, those researchers were generally working with a standard training set that included examples of only about 20 different types of object. In that case, it was fairly straightforward to specify how frequently each object co-occurred with every other object in the set.

Upping the ante

A system that could recognize only 20 different objects, however, wouldn’t be very useful. And with a large number of objects, it becomes computationally impractical to consider the frequency of all possible two-object combinations. In work to be presented at the IEEE Conference on Computer Vision and Pattern Recognition this summer, Choi and her colleagues — including graduate student Joseph Lim and Professors Antonio Torralba and Alan Willsky — describe a different approach. Working with a training set that included more than 4,000 images and 107 different types of objects, they created algorithms that pored through the images and automatically constructed a hierarchical map of the object categories — kind of like the organizational chart for a large company, which shows who reports to whom. In the map, each object is connected to at most one object above it in the hierarchy (everyone in the organization reports to only one person), drastically reducing the number of connections that the system has to consider. The connection between any two objects is given a weight that indicates how often the objects appear together in the training images. The map also encodes information about the typical relative locations of two connected objects: buildings generally appear above roads, for instance, not below them.

When the system analyzes a new image, it uses standard object recognition algorithms to generate a list of candidate objects, together with each object’s “confidence score” — a statistical measure of how likely the object is to have been correctly identified. Then it revises those scores on the basis of the information encoded in the contextual map.

In experiments, Choi compared the performance of the bare object recognition algorithms with their performance when augmented by the contextual map. In both cases, she considered the three objects per image with the highest confidence scores. The bare algorithms correctly identified all three objects roughly 14 percent of the time; with the addition of the contextual map, the success rate jumped to about 25 percent.

Long row to hoe

Of course, that means that the system still failed to correctly identify three objects per image about 75 percent of the time, which shows just how difficult the problem of object recognition remains. “Context really is essential,” says Serge Belongie, an associate professor of computer science at the University of California, San Diego who has worked on both object recognition in general and context-based object recognition in particular. “It deserves a proper treatment, and Jin is doing that.” But Belongie cautions that context awareness will never be more than an augmentation of an underlying system that recognizes objects from visual features. “We absolutely cannot afford to take our eye off the ball of the component recognition systems that need to feed these context engines,” he says. And, he adds, to be useful, object recognition systems will need to be much more precise than today’s prototypes are. “Imagine that you take a picture of a wild mushroom while you’re hiking,” Belongie says, “and then you send it to the system to find out what it is. And it says, ‘Mushroom!’ You’re like, Thanks. That’s really useful. I knew that part.”

Nonetheless, Choi is continuing to improve her contextual-map system, against the day when the underlying algorithms are more reliable. The next version of the system, she says, will add entries to the map that, in effect, represent higher-level scene descriptions. Street scenes, for instance, may frequently feature sky, buildings and roads, while building interiors may frequently feature floors, walls and windows. The system won’t need to explicitly label these additional map entries, however; it will simply register them as foci around which certain types of objects regularly cluster. She’s confident that this modification will make the added benefits of context awareness even more acute.

Provided by Massachusetts Institute of Technology

Citation: Context is ev ... well, something, anyway (2010, March 5) retrieved 25 April 2024 from https://phys.org/news/2010-03-context-ev.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Seeing things: Researchers teach computers to recognize objects

0 shares

Feedback to editors

Demonstration of heralded three-photon entanglement on a photonic chip

2 hours ago

Ancient giant tortoise fossils found in Colombian Andes

4 hours ago

Emperor penguins perish as ice melts to new lows: Study

4 hours ago

Artificial intelligence helps scientists engineer plants to fight climate change

15 hours ago

Ultrasensitive photonic crystal detects single particles down to 50 nanometers

16 hours ago

Scientists map soil RNA to fungal genomes to understand forest ecosystems

17 hours ago

Researchers show it's possible to teach old magnetic cilia new tricks

17 hours ago

Mantle heat may have boosted Earth's crust 3 billion years ago

17 hours ago

Study suggests that cells possess a hidden communication system

17 hours ago

Researcher finds that wood frogs evolved rapidly in response to road salts

17 hours ago

Load comments (0)

Context is ev ... well, something, anyway

Upping the ante

Long row to hoe

Demonstration of heralded three-photon entanglement on a photonic chip

Ancient giant tortoise fossils found in Colombian Andes

Emperor penguins perish as ice melts to new lows: Study

Artificial intelligence helps scientists engineer plants to fight climate change

Ultrasensitive photonic crystal detects single particles down to 50 nanometers

Scientists map soil RNA to fungal genomes to understand forest ecosystems

Researchers show it's possible to teach old magnetic cilia new tricks

Mantle heat may have boosted Earth's crust 3 billion years ago

Study suggests that cells possess a hidden communication system

Researcher finds that wood frogs evolved rapidly in response to road salts

Relevant PhysicsForums posts

Passing variables in FORTRAN

My Website For Creating Interactive Visuals Linked To Equations

Number of Multiplications in the FFT Algorithm

Error logging in: onLoginSuccess is not a function

Latest Notable AI accomplishments

Building a homemade Long Short Term Memory with FSMs

Seeing things: Researchers teach computers to recognize objects

Out of sight, out of mind? Not really

Shape encoding may start in the retina

New search technique for images and videos has broad applications

Toddlers link new words to whole objects

Teaching computers to recognise

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

Context is ev ... well, something, anyway

Upping the ante

Long row to hoe

Demonstration of heralded three-photon entanglement on a photonic chip

Ancient giant tortoise fossils found in Colombian Andes

Emperor penguins perish as ice melts to new lows: Study

Artificial intelligence helps scientists engineer plants to fight climate change

Ultrasensitive photonic crystal detects single particles down to 50 nanometers

Scientists map soil RNA to fungal genomes to understand forest ecosystems

Researchers show it's possible to teach old magnetic cilia new tricks

Mantle heat may have boosted Earth's crust 3 billion years ago

Study suggests that cells possess a hidden communication system

Researcher finds that wood frogs evolved rapidly in response to road salts

Relevant PhysicsForums posts

Related Stories

Seeing things: Researchers teach computers to recognize objects

Out of sight, out of mind? Not really

Shape encoding may start in the retina

New search technique for images and videos has broad applications

Toddlers link new words to whole objects

Teaching computers to recognise

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience