Previous Work: COrtical Separation Models for Overlapping Speech (COSMOS)
In this collaborative project among ICSI, UCSF, and Columbia, researchers are measuring brain activity to understand in detail how human listeners are able to separate and understand individual speakers when more than one person is talking at the same time. This information can then be used to design automatic systems capable of the same feat.
Natural environments are full of overlapping sounds, and successful audio processing by both humans and machines relies on a fundamental ability to separate out sound sources of interest. This is commonly referred to as the "cocktail party effect," based on the ability of people to hear what a single person is saying despite the noisy background audio from other speakers. Despite the long history of research in hearing, this exceptional human capability for sound source separation is still poorly understood, and efforts to automatically separate overlapping voices by machine are correspondingly crude: although great advances have been made in robust processing of noisy speech by machine, separation of complex natural sounds (such as overlapping voices) remains a challenge. Advances in sensor technology now enable the modeling of this function in humans, giving an unprecedented, detailed view of sound representation processing in the brain. This project works specifically with measurements of neuroelectric response made directly on the surface of the human cortex (currently with a 256-electrode sensor array) for patients awaiting neurosurgery. Using such measurements made for controlled mixtures of voices, the project will endeavor to both develop models of voice separation in the human cortex by reconstructing an approximation to the acoustic stimulus from the neural population response, and in the process learning the linear mapping between the neural response back to a spectrogram measure of the stimulus. To attempt to significantly improve the ability of machine algorithms to mimic human source separation capability, the project will also focus on a signal processing framework that supports experiments with different combinations of cues and strategies to optimize agreement with the recordings of neural activity. The engineering model is based on the Computational Auditory Scene Analysis (CASA) framework, a family of approaches that have shown competitive results for handling sound mixtures.
Funding provided by NSF grant IIS-1320260, Towards Modeling Source Separation from Measured Cortical Responses.