Voice-operated control technology: From spelled-out words to direct input

Aug 26, 2009
  • Latest-generation LINGUATRONIC in the S-Class
  • Whole-word destination input improved even further
  • Voice recognition through analysis within milliseconds
To ensure that the LINGUATRONIC voice-operated control system obeys the driver's every word, it was subjected to a highly involved learning process during its development. It was then tested in all the languages, and by Mercedes customers in all language regions.
It is however very important for LINGUATRONIC not only to understand every word, but also every male or female driver. Every person has his or her own pronunciation, tone and individual speech cadences. To make the dialogue perfect, the Mercedes system offers an "after-training" function: a personal conversation with Ms Libbach or one of her colleagues, during which the driver can individually adapt the voice recognition to the sound of his/her voice and intonation.
Around ten years ago, drivers were only able to operate the onboard telephone with voice commands. Since 2000 LINGUATRONIC has been capable of more, and now controls the car radio and CD-changer as well. Since 2002 the Mercedes-Benz navigation system has also been optionally controllable by the voice recognition system. The first-generation system only required a processor with a memory capacity of 512 kilobytes, but more than ten megabytes are necessary nowadays.
For a long time drivers were obliged to enter the destination by spelling out the town and street names. This changed in 2002, in the E and S-Class, where it was now possible to input around 650 place names in Germany by whole-word voice command. Nowadays LINGUATRONIC not only understands all town and street names when destinations are entered, but also whole words when selecting a radio station or names from the personal telephone directory. The driver only needs to say the destination, whereupon the system searches its electronic memory for the relevant town and street. If there are several similar-sounding names, the display shows a selection.
Destination input: the driver says the town and street names directly in sequence
In the current S-Class, which has been on the market since summer 2009, Mercedes engineers have improved the whole-word voice input function even further. They call this new development a "one-shot" function, and it makes voice-operated control even easier and faster. After speaking the command "Enter destination", the driver says the desired destination as a single command - for example "Stuttgart, Epplestraße". The system immediately begins to work out the route, only pausing to enquire whether a house number is to be entered as well. There is then a verbal acknowledgement: "Stuttgart, Epplestraße confirmed. Route guidance starting now."
The largest active vocabulary is to be found in the LINGUATRONIC system of Mercedes models in the US state of California, where whole-word input of around 220,000 street names is possible. In Germany around 80,000 towns and more than 470,000 street names can be input by voice command.
LINGUATRONIC is a major Mercedes-Benz contribution to road safety, as drivers no longer need to take their hands off the wheel to operate the car phone or audio equipment. They are therefore better able to concentrate on the traffic situation.
Mercedes-Benz also uses speech synthesis technology to read out important traffic information affecting the route, or SMS messages.
Voice recognition: LINGUATRONIC "listens" for phonemes
During the brief dialogue between the driver and LINGUATRONIC, the sound signal is digitised, converted into a frequency range and finally analysed. Within milliseconds, the computer extracts various characteristics from the speech signal in order to recognise what are known as ‘phonemes’. To the linguistic scientist these are the smallest sound components of a language, and they are decisive for understanding the words. The control system is able to recognise words by combining the phonemes and comparing the result with the contents of a phoneme dictionary stored in memory. Each language has its own, typical phonemes; LINGUATRONIC uses around 40 for the German language.
LINGUATRONIC processes the phonemes as digital codes. The electronics instantly check each sound, join the different phonemes together and also verify the acoustic probability of the word.
So that even fine nuances in pronunciation are recognised reliably, Mercedes engineers have interposed a special background noise suppression feature. This enables voice commands to be well recognised even at higher speeds. Up to a certain speed, this means that LINGUATRONIC even works when the roof of a cabriolet or roadster model is open.