Proposal for AVIOS/Speechtek Spring 2004

Distributed Speech Recognition for Mobile Speech and Multimodal Interfaces.

An overview of the ETSI Aurora DSR standards.

Holly Kelleher & David Pearce

Motorola Labs

Summary

This presentation provides an overview of the Distributed Speech Recognition (DSR) Standards developed in ETSI.

DSR provides improved recognition performance compared to using a mobile voice channel to connect to a remote recognition server. By performing the front-end processing at the client device it avoids the degradations in performance caused both by the voice codec itself and by the effects of transmission errors. The performance advantages can be particularly valuable for automotive telematics applications since the channel degradations over the voice channel can be particularly severe.

Also, by moving to the packet data channel, DSR allows combined voice and data on a single packet data channels such as GPRS enabling new multimodal interfaces for the driver.

The first ETSI Aurora DSR standard for the mel-cepstrum front-end (ES 201 108) was published in April 2000. Since then, three additional standards have been developed. The DSR Advanced front-end (ES 202 050) provides improved robustness to background noise giving an average of 53% reduction in word error rate compared to the mel-cepstrum. Two new standards, ES 202 211 (the extension for ES 201 108) and ES 202 212 (the extension for ES 202 050) will be published in Nov 2003. These enable the reconstruction of the speech waveform and help the recognition of tonal languages such as Mandarin and Cantonese.

In addition the payload formats for the transport of the DSR features using the RTP protocols are in the process of being standardised in the IETF.

A complete set of standards for the Extended Advanced DSR front-end and their transport protocols are now in place and ready for deployment. These standards are under active consideration by 3GPP (3rd Generation Partnership Project) to become the recommended codec for “speech enabled services”. We look forward to the range of new speech and multimodal services that DSR enables both on handheld devices and for automotive telematics.

This presentation will give an overview of the DSR standards and their implications for new speech and multimodal services implemented over mobile packet data.

Contact information:

David Pearce: