Research team develops systems that process and understand spoken language, especially Basque

March 10, 2008

A research team drawn from the Department of Systems and Automation Engineering of the Polytechnic University School and from the Faculty of Informatics at the Donostia-San Sebastián campus of the University of the Basque Country (UPV/EHU) and led by lecturer Miren Karmele Lopez de Ipiña, is developing systems that process and understand spoken language and automatically obtain information particularly from Basque radio and television.

Carrying out a search in the net for written documents is an easy task – the word is simply introduced in to the search tool. Nevertheless, these searches do not work with the spoken word or with audio archives, unless these have an accompanying written explanation.

Recognising spoken language and converting it into text is not easy. The words cannot be easily distinguished from each other, intonation has to be taken into consideration and, besides, physical signal noise is also an obstacle. Because of all this, there is a huge market in systems that process and understand spoken language, i.e. systems that convert it into written text. Such systems are integrated, mainly, into telephone services such as prior appointment, requests for products, bookings for performances, etc. In any case, there are also other devices, for example, automatic dictation i.e. systems that convert oral text to written on the spot. It is in this latter aspect that the research team at the Department of Systems and Automation Engineering at the UPV/EHU is focusing.

For the spoken process, the system has to be very well practised, i.e. it has to be taught with a training programme known as machine-study. First, television or radio audio files are needed and it is also necessary to have certain reference texts from the mentioned media. The research team at the UPV/EHU, for example, frequently use files from the Gaur Egun and Teleberri slots (Basque Television news, in Basque and Spanish respectively) in order to programme/train the system. It is not necessary to know what is being said word for word; the system has to be able to carry out a resume of what is heard. At the end, the system seeks to comprehend the relation between the words and the sounds.

Once terminated the training/learning process, the system should be capable of understanding what is heard in any programme of Gaur Egun or Teleberri. Although the learning process is very lengthy, once the system interiorises the rules or the information, i.e. suitable reference material, the result is obtained rapidly - in this case, written text from spoken.

Small and big

In reality, the majority of applications of this type on the market are aimed at the “big” languages, above all English. In any case, the research team at the Polytechnic University School in Donostia-San Sebastián, together with the IXA team, GTTS and the Computational Intelligence team from UPV/EHU, are working with the Basque language - Euskera. The main difference between “small” languages and “big” ones is the number of reference data. These types of systems for English have an impressive amount of data while reference material for Basque, on the other hand, is considerably less. Given all this, the research team is focusing on developing new techniques to take better advantage of these minimum data and to use them with greater precision.

In order to obtain greater precision, mathematic equations are used. What is involved is the location of the most important characteristics that provide suitable information for the audio files. It is not easy to carry out this selection, distinguishing suitable data from unsuitable information. Normally, the UPV/EHU research team takes frequency and intonation into consideration in order to classify all the information gathered (for example, to differentiate a question from a statement, etc).

These systems depend a lot on the language and each language has its own system. The UPV/EHU research is not only working with Euskera, but also with Spanish and French. When studying the Teleberri and Infozazpi programmes, amongst others, they have two goals: on the one hand, comprehend Spanish and French — as well as Basque — and, on the other, detect the similarities within these systems between Euskera and the other two languages, in order to train the systems in Basque even more.

As regards this, the UPV/EHU research team is currently undertaking trials to develop a system that is valid for more than one language. This is the precisely the challenge for the future: to develop a system that is capable of understanding Basque, Spanish and French.

Source: Elhuyar Fundazioa


print this article email this article download pdf blog this article bookmark this article     Stumble it Digg this share on Facebook retweet share on Reddit add to delicious
Rate this story - 4 /5 (4 votes)


March 10, 2008 all stories

Comments: 0

4 /5 (4 votes)
  • Stumble this up

  • Digg this

  • share this

  • hide
  • Related Stories

  • Moving toward a new vision of education
    created Sep 15, 2009 | popularity not rated yet | comments 0
  • A mathematical method helps to select human embryos
    created Feb 24, 2009 | popularity not rated yet | comments 0
  • Tartalo the robot is knocking on your door
    created Jun 18, 2008 | popularity not rated yet | comments 0
  • Precision control of movement in robots
    created May 16, 2008 | popularity not rated yet | comments 0
  • Magnetic atoms of gold, silver and copper have been obtained
    created Feb 28, 2008 | popularity not rated yet | comments 0



  • hide
  • Relevant PhysicsForums posts

  • Sixth sense technology
    created 1hour ago
  • kindle e-reader and scientific papers
    created Nov 24, 2009
  • Help with a camera choice
    created Nov 18, 2009
  • casio calculator that's similar to TI-89
    created Nov 08, 2009
  • Advice on what cell phone to get
    created Nov 08, 2009
  • Changing the language options on your phone.
    created Nov 03, 2009
  • More from Physics Forums - Computing & Technology

Other News

Design chosen for British 1,000 mph car

Design chosen for British 1,000 mph car (w/ Video)

Technology / Engineering

created 17 hours ago | popularity 3.7 / 5 (6) | comments 5

(PhysOrg.com) -- A British team hoping to be the first to get a car to 1,000 mph (1,610 km/h) has made its final design selection. The six-tonne car, known as the Bloodhound, will be powered by a Eurofighter ...


Time Inc., Conde Nast and Hearst are preparing to launch an online newsstand described as an "iTunes for magazines"

Magazine publishers creating 'iTunes for magazines': reports

Technology / Internet

created 7 hours ago | popularity not rated yet | comments 0

US magazine publishers Time Inc., Conde Nast and Hearst are preparing to launch an online newsstand described as an "iTunes for magazines," according to published reports.


Should I buy a PC or Mac?

Technology / Software

created 5 hours ago | popularity 4 / 5 (2) | comments 3

Q. Our 6-year-old PC computer is dying a slow death and we are considering moving to a new iMac but have a few concerns. First, of all, we have several Word documents on our disk drive now that we want to keep and add to ...


ORNL 'deep retrofits' can cut home energy bills in half

ORNL 'deep retrofits' can cut home energy bills in half

Technology / Energy

created 8 hours ago | popularity 3 / 5 (2) | comments 0

(PhysOrg.com) -- Oak Ridge National Laboratory has announced plans to conduct a series of deep energy retrofit research projects with the potential to improve the energy efficiency in selected homes by as ...


Web sites aim to survive with hyperlocal focus

Technology / Internet

created 3 hours ago | popularity not rated yet | comments 0

Finding a financially viable way to provide local news is a challenge large metropolitan newspapers are confronting. But a Coral Gables, Fla., Web site is among a few locally with faith it can succeed.