Greedy Pursuits Algorithms for Representing Audio Signals, with Applications to Compression, Source Separation, and Audio Fingerprint

Gael Richard of Telecom ParisTechGaël Richard

Télécom ParisTech

Thursday, October 25
2:30 PM, ICSI Lecture Hall
 

 

Abstract

After a brief presentation of current research directions in the Audio, Acoustics and Waves research group of Télécom ParisTech, I will discuss in general terms the interest of greedy pursuits algorithms (such as Matching Pursuit) for representing audio signals. Such algorithms rely on an iterative atom selection step in a dictionary of atoms. They usually require the calculation of numerous projections, which can be computationally costly for large dictionaries. Furthermore, the obtained decomposition may be uninformative on the nature or content of the audio signal. To tackle the first issue, I will describe an extension of the classical Matching pursuit which uses a non-adaptive random sequence of subdictionaries in the decomposition process, thus parsing a large dictionary in a probabilistic fashion with no additional projection cost and no parameter estimation [1].  It will be shown in particular that this additional randomness is particularly attractive for audio compression. I will then describe another extension of the classical Matching Pursuit algorithm that directly exploits the signal redundancy. Preliminary results for audio source separation will then be given [2]. Finally, a third variation of the classical Matching Pursuit algorithm will be described and its potential for Audio Fingerprint will be demonstrated on synthetic and real broadcast audio databases [3].

Key words: Matching Pursuit, Audio signal representations, Audio compression, Audio source separation, Audio Fingerprinting

[1] M. Moussallam, L. Daudet, G. Richard, "Matching pursuits with random sequential subdictionaries", Signal Processing, 2012, http://dx.doi.org/10.1016/j.sigpro.2012.03.019

[2] S. Fenet, M. Moussallam, Y. Grenier, G. Richard and L. Daudet, A Framework for Fingerprint-Based Detection of Repeating Objects in Multimedia Streams,

[3] M. Moussallam, G. Richard and L. Daudet, Audio Source Separation Informed by Redundancy with Greedy Multiscale Decompositions, in Proc. of Eusipco 2012.

Speaker Bio

I received the State Engineering degree from Télécom ParisTech  (formerly ENST), Paris, France, in 1990, a PhD from LIMSI-CNRS, University of Paris-XI, in 1994 in speech synthesis, and the Habilitation à Diriger des Recherches degree from the University of Paris XI in September 2001. After my PhD, I spent two years at the CAIP Center, Rutgers University, Piscataway, NJ, in the speech processing group of Prof. J. Flanagan, where I explored innovative approaches for speech production. Between 1997 and 2001, I successively worked for Matra Nortel Communications, Bois d'Arcy, France, and for Philips Consumer Comunications, Montrouge, France. In particular, I was the project manager of several large-scale European projects in the field of audio and multimodal signal processing. In September 2001, I joined the Signal and Image Processing Department at Télécom ParisTech, where I am now full professor in audio signal processing and head of the Audio, Acoustics and Waves research group. Co-author of over 100 papers and inventor in a number of patents, I am also one of the experts of the European commission in the field of audio signal processing and man/machine interfaces. I was an associate editor of the IEEE Transactions on Audio, Speech and Language Processing between 1997 and 2011 and one of the guest editors of the special issue on “Music Signal Processing” of IEEE Journal on Selected Topics in Signal Processing (2011). I am now member of the IEEE Audio and Acoustic Signal Processing Technical Committee, member of the EURASIP and AES, and senior member of the IEEE.

View the slides from the talk.