Latent Structured-SVM/Deep Neural Network Models for Continuous Speech Recognition

Suman Ravuri

ICSI

Tuesday, August 11, 2015
12:30 p.m., Conference Room 5A

Automatic speech recognition typically depends on four components -- a deep neural network for frame-level classification, HMM models for temporal triphone modeling, a dictionary for specifying allowable phone sequences for words, and a language model for determining likely word sequences -- to convert frame-level features into what words were spoken in an utterance. If these components accurately modeled the underlying distribution (if one exists) of speech, then speech recognition in non-noisy settings would likely already be solved. That word error rates are still high suggest our model distributions do not match the true distribution particularly well. A natural question to ask is if there is a way to better match our poor models to the true distribution of speech. Noting that parameters from the last hidden layer of the deep neural network specify a log-linear model, we investigate Structured SVM models to better model speech. Compared to existing sequence discriminative training criteria such as state-level Minimum Bayes Risk (sMBR), structured SVMs enjoy better generalization guarantees. On the ICSI Meeting Corpus, the proposed method outperforms other sequence discriminative training criteria by 1% absolute, while needing 33-66% fewer utterances for convergence.

Bio:

Suman Ravuri is completing his PhD at UC Berkeley, working with advisor Professor Nelson Morgan at ICSI. He is interested in extensions to neural networks to improve performance of automatic speech recognition. Prior to graduate school, he spent his undergraduate days at Columbia University, where he worked with ICSI alum Professor Dan Ellis on pitch stylization and cover song detection while obtaining a BS in electrical engineering. Apparently, he also obtained a BA in Classics, but was too embarrassed to tell anyone until he was fairly certain that it wouldn't get him kicked out of the department.