Hybrid MLP Systems for Automatic Speech Recognition: A Structured Support Vector Machine Approach
Suman Ravuri
ICSI
Tuesday, March 4
12:30 p.m., Conference Room 5A
Hybrid ANN/HMM systems, in which multi-layer perceptrons are augmented with HMM-style time transitions, are now finding a resurgence within the Automatic Speech Recognition community after 20 years and have produced state-of-the-art results on large vocabulary speech recognition tasks. Much of the recent interest has centered around deep neural networks, in which now multiple hidden layers can be effectively trained, through new techniques such as pre-training. In this talk, I focus on currently less researched section: the hidden to output and the time transition layers. We can use the structured support vector machine formalism, which can be thought of as combining the structure of a graphical model with the training criterion of the support vector machine, to create a new type of hybrid ANN/Structured SVM system. I show how this model can be used in a Tandem framework, and how it improves performance on a small and large vocabulary speech recognition tasks.
Bio:
Suman Ravuri is a PhD student at UC Berkeley, working with Professor Nelson Morgan on automatic speech recognition. He currently works on creating and applying new machine learning techniques to improve speech recognition, but previously worked on creating new Gabor features to improve ASR. Prior to coming to Berkeley, he was advised by Professor Dan Ellis at Columbia University, where he holds a BS in electrical engineering and a BA in classics, and where he performed research on cover song detection and speech synthesis.