Listen, watch, read -- computers search for meaning

October 30, 2009

(PhysOrg.com) -- European researchers have created the first integrated semantic search platform that integrates text, video and audio. The system can 'watch' films, 'listen' to audio and 'read' text to find relevant responses to semantic search terms. At last, computers are able to look for meaning in our multimedia searches.

There is a phenomenal amount of content out there on the internet, but therein lies a problem. Sure, text content can be skimmed or glanced, but audiovisual content has to be viewed in linear time. It is very complex to search inside a film or audio recording for relevant information.

But European researchers in the MESH project have developed an integrated platform which they say, for the first time, can combine semantic search - or search by the meaning of the words - and a host of associated tools to deliver more relevant information, from a wide variety of sources that can be accessed from an individual user.

The platform can search annotated files from any type of media - photographs, videos, sound recordings, text, document scans - using a host of techniques including optical character recognition, automated speech recognition and automatic annotation of movies and photographs that track salient concepts.

Technology shift

This represents an emerging paradigm shift in .

Here is why. Right now, text in computing is defined by a series of numbers, most commonly the Unicode standard. Each number signifies a particular letter, and computers can scan these codes very quickly. So when you enter a search term, the machine has no idea what those letters signify. It simply looks for the pattern - it has no inkling of the concept behind the pattern.

But in semantic search, every bit of information is defined by potentially dozens of meaningful concepts. When a copywriter invoices for his or her work, for example, the date could be defined in terms of calendar, invoice, billing period, and so on. All these definitions for one piece of information are called ‘metadata’, or information about information.

Collections of agreed metadata terms for a particular field or task, like medicine or accounting, are called ontologies.

So the computer not only searches for the term, it searches for related metadata that defines types of information in specific ways. In reality, the computer still does not ‘understand’ a concept in its semantic search - it continues to look for patterns of letters. But because the concepts behind the search terms are included, it can return results based on concepts as well as text patterns.

Imminent domains

These technologies are becoming common in particular knowledge domains, and more are emerging every day, but most relate to the concepts behind text-based documents. The MESH platform sought to use for every type of media.

On the way, it created some cutting-edge technology. “Our automatic annotation for video, for example, is state of the art,” explains Pedro Concejero, coordinator of the MESH project.

“The annotation system is capable of identifying the general scene setting, such as whether a video is a studio shot or a shot recorded on location. With adequate training, it can also detect (within some error margins) the general topic of the video, such as a scene about an earthquake or a flood. It can also find a number of salient objects within the scene, such as persons or fire, but cannot yet identify consistently objects with great variations in shape or aspect.”

One of the major challenges of the project was a product of its own success: It annotated too much information!

“This is good - it is what we wanted the system to do - but the quantity of data was vast, too much to handle, so we had to find ways to cut down on the amount of metadata,” Concejero tells ICT Results.

Manual override

So the project developed a manual annotation tool that can, with a little training, be used by non-technical people. “It is a very powerful, very advanced professional program. There are other manual annotation tools available commercially, but we have developed a strong and user-friendly program that could probably compete very successfully with what is currently available.”

For the project, the platform was developed to search video news sources relating to civil unrest and street violence, and natural disasters like earthquakes, forest fires and floods.

“We had to focus the demonstrator because there is a lot of work involved in developing ontologies for specific news topics. You would need to develop a very detailed ontology for politics, or crime and so on. We have designed the system so that it can accept ontologies from elsewhere, but for the demonstrator we reserved our work to these two domains,” says Concejero.

The beginning of the end?

The technology will not be challenging the industry leading search engines any time soon. This project does not necessarily mark the end of the type of keyword-based search that we use every day.

But it could well be the beginning of the end, and in the meantime the work of the MESH project will find a happy home in a number of stand-alone commercial applications and work will, in one way or another, continue to develop new applications.

More information: MESH project

This is part one of a two-part special feature on the MESH project.

Provided by ICT Results

Filter


Move the slider to adjust rank threshold, so that you can hide some of the comments.


Display comments: newest first

RayCherry
Oct 30, 2009

Rank: not rated yet
Language/Text dependent concept identification still provides barriers that will limit the use to specific culture (English speaking).

When the concepts themsleves achieve independence from Language/Text, then the system will be able to catalogue and cross reference multi-cultural concepts, providing a global 'map' of strong and weak linked concepts that underly the languages used to express/communicate them.

Eliminating the Language barrier within the machines, will help the users to do the same.
Rank 4 /5 (1 vote)
Relevant PhysicsForums posts

More news stories

Google users warned of threat to smartphone wallets

Users of Google smartphone wallets were being warned on Friday that there is a way to crack pass codes intended to thwart thieves from going on illicit shopping sprees.

Technology / Internet

created 4 hours ago | popularity 5 / 5 (2) | comments 0

Anonymous knocks CIA website offline (Update)

The website of the Central Intelligence Agency was inaccessible on Friday after the hacker group Anonymous claimed to have knocked it offline.

Technology / Internet

created 6 hours ago | popularity 5 / 5 (7) | comments 11

New error-correcting codes guarantee the fastest possible rate of data transmission

Error-correcting codes are one of the triumphs of the digital age. They’re a way of encoding information so that it can be transmitted across a communication channel — such as an optical fiber o ...

Technology / Computer Sciences

created 14 hours ago | popularity 4.8 / 5 (6) | comments 6 | with audio podcast

New power source discovered

(PhysOrg.com) -- Researchers at the Massachusetts Institute of Technology (MIT) and RMIT University have made a breakthrough in energy storage and power generation.

Technology / Energy & Green Tech

created 13 hours ago | popularity 4.8 / 5 (21) | comments 7 | with audio podcast

Small modular reactor design could be a 'SUPERSTAR'

(PhysOrg.com) -- Though most of today's nuclear reactors are cooled by water, we've long known that there are alternatives; in fact, the world's first nuclear-powered electricity in 1951 came from a reactor ...

Technology / Energy & Green Tech

created 14 hours ago | popularity 4.3 / 5 (11) | comments 21 | with audio podcast


Complex wiring of the nervous system may rely on a just a handful of genes and proteins

Researchers at the Salk Institute have discovered a startling feature of early brain development that helps to explain how complex neuron wiring patterns are programmed using just a handful of critical genes. ...

The power of estrogen -- male snakes attract other males

A new study has shown that boosting the estrogen levels of male garter snakes causes them to secrete the same pheromones that females use to attract suitors, and turned the males into just about the sexiest ...

Humans may have helped the decline of African rainforests 3000 years ago

(PhysOrg.com) -- Large areas of rainforests in Central Africa mysteriously disappeared over three thousand years ago, to be replaced by savannas. The prevailing theory has been that the cause was a change ...

Putting the squeeze on planets outside our solar system

(PhysOrg.com) -- Using high-powered lasers, scientists at Lawrence Livermore National Laboratory and collaborators discovered that molten magnesium silicate undergoes a phase change in the liquid state, abruptly ...

Could Venus be shifting gear?

(PhysOrg.com) -- ESA’s Venus Express spacecraft has discovered that our cloud-covered neighbour spins a little slower than previously measured. Peering through the dense atmosphere in the infrared, the ...

Advanced power-grid model finds low-cost, low-carbon future in West

(PhysOrg.com) -- The least expensive way for the Western U.S. to reduce greenhouse gas emissions enough to help prevent the worst consequences of global warming is to replace coal with renewable and other ...