Zero Resources Spoken Audio Search
Xavier Anguera
Telefonica Research, Barcelona
Monday, July 15, 2013
11:00 a.m.., ICSI Lecture Hall
Abstract:
Watch this talk on YouTube |
In this talk I will talk about zero-resources spoken audio search, also known as Query-by-Example Spoken Term Detection. Given a corpus of audio in a single or multiple languages for which we do not have any transcripts, the objective here is to build a language independent system that allows us to locate where an audio query has been spoken in the corpus. This is the objective of the Spoken Web Search (SWS) Task within the Mediaeval benchmark evaluation, that has taken place the last two years and is also taking place this year with 17 participating institutions so far.
In the talk I will first mention the main approaches that have been used to tackle this task in the SWS evaluation. Then, I will focus on dynamic programming approaches that inherit from the Dynamic Time Warping (DTW) algorithm the capability to match two audio signals being agnostic of the language being spoken, which have shown to outperform other approaches for this task. One of such algorithms is the Information Retrieval DTW (IRDTW) algorithm, which uses indexing techniques to speed up the search in finding matching patterns, and information retrieval techniques to reduce the amount of computer memory required. These characteristics make the IRDTW algorithm a good candidate for large-scale spoken audio search implementations.
Bio:
Xavier Anguera: Ing. [MS] 2001 UPC University (Barcelona, Spain), [MS] 2001 European Masters in Language and Speech, Ph.D. 2006 UPC University, with a thesis on speaker diarization for multi-microphone meeting recordings. From 2001 to 2003 he worked for Panasonic Speech Technology Lab in Santa Barbara, CA on text-to-speech for several languages. From 2004 to 2006 he was a visiting researcher at the International Computer Science Institute (ICSI) in Berkeley, CA. Since 2007 he is a research scientist at Telefonica Research in Barcelona. His research interests cover speech processing (both speaker and content-based) and multimodal multimedia processing. He has published over 60 peer reviewed papers and has several accepted or pending patents. He is an active member of IEEE and ACM associations, for which he has served in the organization and in the PC of several multimedia and speech conferences.