Statistical Interfaces for Robust Speech Recognition and Model‐Based Speech Processing, Part II
Dorothea Kolossa
ICSI and Ruhr-Universität Bochum, Germany
Tuesday, July 21, 2015
12:30 p.m., Conference Room 5A
Human beings are highly effective at integrating multiple sources of uncertain information, and mounting evidence points to this integration being practically optimal in a Bayesian sense. Yet, in speech processing systems, the two central tasks of speech signal enhancement and of speech or phonetic-state recognition are often performed almost in isolation, with only estimates of mean values being exchanged between them. This talk describes concepts for enhancing the interface of these two systems, considering a range of appropriate probabilistic representations. Examples will illustrate how such interfaces can improve the quality of both components: On the one hand, more reliable ASR can be attained, while on the other hand, enhanced signal quality is achieved when feeding back information from a speech recognition stage to the signal preprocessing. This latter idea will be demonstrated using the example of twin HMMs, audiovisual speech models that help to recover lost acoustic information by exploiting video data. Overall, it will be shown how broader, probabilistic interfaces between signal processing and speech recognition can help to achieve better performance in real-world conditions, and to more closely approximate the Bayesian ideal of using all sources of information in accordance with their respective degree of reliability.
Bio:
Dorothea Kolossa received her PhD from TU Berlin, Germany, in 2007, and in the course of her studies and work, she has stayed at Shoji Makino's group at the NTT Communication Science Labs in Kyoto, worked with Qiang Huo at the University of Hong Kong, and has been at UC Berkeley as visiting faculty from 2009-2010. She joined the faculty of Ruhr-University Bochum, Germany, in 2010, where she is currently heading the Cognitive Signal Processing group. She is interested in robust speech recognition and in probabilistic approaches to machine learning and speech enhancement.