Publication Details
Title: The MASC Word Sense Corpus
Author: R. Passonneau, C. Baker, C. Fellbaum, and N. Ide
Bibliographic Information: Proceedings of the 8th Conference on International Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 3025-3030
Date: May 2012
Research Area: AI
Type: Article in conference proceedings
PDF: http://www.icsi.berkeley.edu/pubs/ai/mascword12.pdf
Overview:
The MASC project has produced a multi-genre corpus with multiple layers of linguistic annotation, together with a sentence corpus containing WordNet 3.1 sense tags for 1000 occurrences of each of 100 words produced by multiple annotators, accompanied by indepth inter-annotator agreement data. Here we give an overview of the contents of MASC and then focus on the word sense sentence corpus, describing the characteristics that differentiate it from other word sense corpora and detailing the inter-annotator agreement studies that have been performed on the annotations. Finally, we discuss the potential to grow the word sense sentence corpus through crowdsourcing and the plan to enhance the content and annotations of MASC through a community-based collaborative effort.
Acknowledgements:
This work was partially supported by funding provided to ICSI through National Science Foundation grant CNS: 0708952 (“Computing RES Infrastructure”). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of the National Science Foundation.
Bibliographic Reference:
R. Passonneau, C. Baker, C. Fellbaum, and N. Ide. The MASC Word Sense Corpus. Proceedings of the 8th Conference on International Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 3025-3030, May 2012
Author: R. Passonneau, C. Baker, C. Fellbaum, and N. Ide
Bibliographic Information: Proceedings of the 8th Conference on International Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 3025-3030
Date: May 2012
Research Area: AI
Type: Article in conference proceedings
PDF: http://www.icsi.berkeley.edu/pubs/ai/mascword12.pdf
Overview:
The MASC project has produced a multi-genre corpus with multiple layers of linguistic annotation, together with a sentence corpus containing WordNet 3.1 sense tags for 1000 occurrences of each of 100 words produced by multiple annotators, accompanied by indepth inter-annotator agreement data. Here we give an overview of the contents of MASC and then focus on the word sense sentence corpus, describing the characteristics that differentiate it from other word sense corpora and detailing the inter-annotator agreement studies that have been performed on the annotations. Finally, we discuss the potential to grow the word sense sentence corpus through crowdsourcing and the plan to enhance the content and annotations of MASC through a community-based collaborative effort.
Acknowledgements:
This work was partially supported by funding provided to ICSI through National Science Foundation grant CNS: 0708952 (“Computing RES Infrastructure”). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of the National Science Foundation.
Bibliographic Reference:
R. Passonneau, C. Baker, C. Fellbaum, and N. Ide. The MASC Word Sense Corpus. Proceedings of the 8th Conference on International Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 3025-3030, May 2012