100 Million Images and Videos
Back to Gazette, Fall 2014
The Audio and Multimedia team is working with Yahoo Labs and Lawrence Livermore National Laboratory to process and analyze 100 million photos and videos publicly available under Creative Commons licenses. At more than 50 terabytes, the collection, the Yahoo Flickr Creative Commons 100 Million, is believed to be the largest corpus of user-generated multimedia publicly available for research.
On September 1, the team released full sets of three audio features for the nearly 800,00 videos in the corpus, computed on LLNL’s Cray Catalyst supercomputer. Audio features, or statistical representations of sound, are used commonly in speech and audio recognition. The features released on September 1 are Mel frequency cepstral coefficients, Kaldi pitch, and subband autocorrelation classification. These can be used to build and train systems that automatically analyze the content of media with audio, such as videos.
The team also released an annotated index of more than 50,000 videos selected from the corpus. The index categorizes videos that depict 10 “events” based on events used in the 2011 multimedia Event Detection task of the National Institute of Standards and Technology’s TRECVID Evaluation, which challenges participants to build systems that can find videos of the events. One of the events, for example, is “getting a vehicle unstuck.” The new YLI-MED index released by the team lists 2,000 videos that depict events and 48,000 that do not, providing counter-examples needed to train systems.
Also on September 1, the team released a demo of audioCaffe, the first version of the framework being built for the SMASH project. The goal of the project is to create a media-analysis tool that can be run on parallel machines, reducing the time it takes to analyze media, and that can serve as a single framework for a variety of tasks like speech recognition, audio analysis, and video event detection. SMASH is funded by a National Science Foundation grant.
The set of features and the annotated indices are being held in the Yahoo-Lawrence-ICSI Corpus.