Hybrid Neural Network/Structured SVM Models for Automatic Speech Recognition
Suman Ravuri
ICSI
Tuesday, September 23, 2014
12:30 p.m., Conference Room 5A
For my quals talk, I propose hybrid neural network/Structured SVM systems for automatic speech recognition. Neural Networks have enjoyed a resurgence in research interest, as more computation and data have allowed for training “deeper” structures —those which contain more than one hidden layer—, and now represent the state-of-the-art in the ASR. While currently most research in these systems consists of changing how the (last) hidden layer is computed from the input layer, far less has focused on the simple logistic regression between the hidden and output layers and, for hybrid NN/HMM acoustic models, the Markov transitions between the output layers. I propose replacing these latter two elements with a structured support vector machine, which can be thought of as an extension to binary support vector machines that can identify more natural outputs such as sequences. After a brief overview of automatic speech recognition and my prior work, I will outline hybrid neural network/Structured SVM systems for both Tandem systems and acoustic modeling, for which I will present results and analysis, respectively. Since training of neural network/Structured SVM acoustic models mirror research in discriminative training of prior systems, I will compare the proposed method to my own. In particular, I will show how different training criteria fall out of particular design decisions under a statistical decision theory framework.