Increasing Neural Network Acoustic Model Size for ASR

Andrew Maas

Stanford University

Tuesday, April 22
12:30 p.m., Conference Room 5A

Deep neural networks (DNNs) are now a central component of many state-of-the-art speech recognition systems. Part of the promise of DNNs is their ability to represent increasingly complex functions as the number of DNN parameters increases. I will present my ongoing work investigating the performance of DNN-based hybrid speech recognition systems as DNN model size increases.  Using a distributed GPU architecture we train DNN acoustic models nearly an order of magnitude larger than those typically found in speech recognition systems. DNNs of this scale achieve substantial reductions in training set word error rate. On the 300hr Switchboard benchmark, training set word error rate reductions from large DNNs do not translate to improved test set performance. Our results suggest that improving acoustic model performance may be suitably framed as an issue of machine learning generalization performance. We investigate several regularization techniques including dropout and training set realignment. Ultimately we find that training on the larger 2,000hr Fisher corpus best utilizes the increased capacity of large DNNs.

Bio:

Andrew Maas is a fifth year computer science PhD student at Stanford advised by Andrew Ng and Dan Jurafsky. His research focuses on neural networks for speech recognition and natural language processing. He has worked on a variety of machine learning projects in academia and industry including keystroke biometrics, sentiment analysis, computational neuroscience, and reinforcement learning. Before Stanford he completed a B.S. in computer science and cognitive science at Carnegie Mellon.