Feature Design for Robust Speech Recognition: Handcrafted or Trained?
Shuo-Yiin Chang
ICSI
Tuesday, April 15
12:30 p.m., Conference Room 5A
As has been extensively shown, acoustic features for speech recognition can be learned from neural networks with multiple hidden layers. The developments of neural network raise the question about the role of feature representations other than neural network. However, while trained features could effectively reduce WER on matched testing set, they might be specialized and might not sufficiently generalize to test sets that have a significant mismatch to the training data. Therefore, robust feature benefited from formulating signal processing techniques is important for noise-robust speech recognition. In the proposed work, our goal is to develop a robust feature incorporating the knowledge-based feature design and neural network modeling. We developed the sparse Gabor features based on power-normalized spectrum. The filtering process of the developed feature is incorporated into convolutional kernels of convolution neural network. We conducted the experiments on different train-test condition to investigate the performance of several neural network features. On one hand, we use raw spectrum without further signal processing as input for neural network. On the other hand, we develop a series of filters to create robust features prior to neural network training.
Bio:
Shuo-Yiin Chang is a PhD student at UC Berkeley, working with Professor Nelson Morgan on noise-robust speech recognition. He currently works on developing convolution neural network architecture modeling Gabor characteristic to improve speech recognition. Prior to coming to Berkeley, he graduated with his master degree advised by Prof. Lin-Shan Lee at National Taiwan University, where he performed research on developing acoustic feature and model to improve Mandarin speech recognition.