On Audio-Visual Information for Speech
Hang Su
ICSI
Tuesday, September 22, 2015
12:30 p.m., Conference Room 5A
This talk summarizes the work done during my internship in MSR this summer, focusing on utilizing visual features for speech recognition and other applications. On a real-life tech-talk dataset, we observed minor improvement combining visual feature under noisy condition. We further conduct research on detecting audio-visual synchrony, and achieved an accuracy of 88.2 percent on utterance level. This talk will also comment on a related work published in last Interspeech conference.