AI
Audio and Multimedia
Big Data
Core Technology for TCS
Extensible Internet
Networking and Security
Research Initiatives
Speech
Usable Security and Privacy
Vision
2i2c
- 2i2c
Robust Deep Learning
- Resilient Dynamic Autoencoders for Modeling and Predicting Earthquake Threats
- Backdoor Detection via Eigenvalues, Hessians, Internal Behaviors, and Robust Statistics

Multi Modal Video Summarization

Principal Investigator(s):

Gerald Friedland

ICSI researchers have been working with DAC to identify and acquire datasets that are sufficient for training Automated Speech Recognition (ASR) models. They are researching and developing ASR models that are robust to noise, music, babble and reverberation. This may include, but is not limited to, the research and implementation of signal processing algorithms that remove segments of an audio stream that do not include speech. ICSI researchers are also working with the DAC team to ensure the model is compliant with the DAC Video Information Summarization, Captioning, Analysis, and Rank Ordering (VISCARO) model. They will research and develop a joint model that includes both automated speech recognition and speaker recognition to determine its potential for improved accuracy.

Main menu

Multi Modal Video Summarization

Quick Links

Research Areas

Projects

Visitor Information

Follow ICSI

Search form

Main menu

Multi Modal Video Summarization

Quick Links

Research Areas

Projects

Visitor Information

Follow ICSI