Publication Details
Title: The YLI‐MED Corpus: Characteristics, Procedures, and Plans
Author: J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won
Bibliographic Information: ICSI Technical Report TR-15-001
Date: March 2015
Research Area: Audio and Multimedia
Type: Technical Reports
PDF: https://www.icsi.berkeley.edu/pubs/techreports/TR-15-001.pdf
Overview:
The YLI Multimedia Event Detection corpus is a public-domain index of videos with annotations and computed features, specialized for research in multimedia event detection (MED), i.e., automatically identifying what's happening in a video by analyzing the audio and visual content. The videos indexed in the YLI-MED corpus are a subset of the larger YLI feature corpus, which is being developed by the International Computer Science Institute and Lawrence Livermore National Laboratory based on the Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset. The videos in YLI-MED are categorized as depicting one of ten target events, or no target event, and are annotated for additional attributes like language spoken and whether the video has a musical score. The annotations also include degree of annotator agreement and average annotator confidence scores for the event categorization of each video. Version 1.0 of YLI-MED includes 1823 "positive" videos that depict the target events and 48,138 "negative" videos, as well as 177 supplementary videos that are similar to event videos but are not positive examples. Our goal in producing YLI-MED is to be as open about our data and procedures as possible. This report describes the procedures used to collect the corpus; gives detailed descriptive statistics about the corpus makeup (and how video attributes affected annotators' judgments); discusses possible biases in the corpus introduced by our procedural choices and compares it with the most similar existing dataset, TRECVID MED's HAVIC corpus; and gives an overview of our future plans for expanding the annotation effort.
Acknowledgements:
The YLI‐MED annotation project described here was funded by a grant from Cisco Systems, Inc., for Event Detection for Improved Speaker Diarization and Meeting Analysis. In addition to Cisco, ICSI's work on the YLI corpus (generally) is funded by Lawrence Livermore National Laboratory as part of a collaborative Laboratory Directed Research and Development project under the auspices of the U.S. Department of Energy (contract DE‐AC52‐07NA27344) and by the National Science Foundation as part of SMASH: Scalable Multimedia content AnalysiS in a High‐level language (grant IIS‐1251276). Any opinions, findings, and conclusions or recommendations are those of the authors and do not necessarily reflect the views of Cisco, LLNL, nor the NSF.
Bibliographic Reference:
J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won. The YLI‐MED Corpus: Characteristics, Procedures, and Plans. ICSI Technical Report TR-15-001, March 2015
Author: J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won
Bibliographic Information: ICSI Technical Report TR-15-001
Date: March 2015
Research Area: Audio and Multimedia
Type: Technical Reports
PDF: https://www.icsi.berkeley.edu/pubs/techreports/TR-15-001.pdf
Overview:
The YLI Multimedia Event Detection corpus is a public-domain index of videos with annotations and computed features, specialized for research in multimedia event detection (MED), i.e., automatically identifying what's happening in a video by analyzing the audio and visual content. The videos indexed in the YLI-MED corpus are a subset of the larger YLI feature corpus, which is being developed by the International Computer Science Institute and Lawrence Livermore National Laboratory based on the Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset. The videos in YLI-MED are categorized as depicting one of ten target events, or no target event, and are annotated for additional attributes like language spoken and whether the video has a musical score. The annotations also include degree of annotator agreement and average annotator confidence scores for the event categorization of each video. Version 1.0 of YLI-MED includes 1823 "positive" videos that depict the target events and 48,138 "negative" videos, as well as 177 supplementary videos that are similar to event videos but are not positive examples. Our goal in producing YLI-MED is to be as open about our data and procedures as possible. This report describes the procedures used to collect the corpus; gives detailed descriptive statistics about the corpus makeup (and how video attributes affected annotators' judgments); discusses possible biases in the corpus introduced by our procedural choices and compares it with the most similar existing dataset, TRECVID MED's HAVIC corpus; and gives an overview of our future plans for expanding the annotation effort.
Acknowledgements:
The YLI‐MED annotation project described here was funded by a grant from Cisco Systems, Inc., for Event Detection for Improved Speaker Diarization and Meeting Analysis. In addition to Cisco, ICSI's work on the YLI corpus (generally) is funded by Lawrence Livermore National Laboratory as part of a collaborative Laboratory Directed Research and Development project under the auspices of the U.S. Department of Energy (contract DE‐AC52‐07NA27344) and by the National Science Foundation as part of SMASH: Scalable Multimedia content AnalysiS in a High‐level language (grant IIS‐1251276). Any opinions, findings, and conclusions or recommendations are those of the authors and do not necessarily reflect the views of Cisco, LLNL, nor the NSF.
Bibliographic Reference:
J. Bernd, D. Borth, B. Elizalde, G. Friedland, H. Gallagher, L. Gottlieb, A. Janin, S. Karabashlieva, J. Takahashi, and J. Won. The YLI‐MED Corpus: Characteristics, Procedures, and Plans. ICSI Technical Report TR-15-001, March 2015