FrameNet Holds 5-Day Workshop, Sponsored by NSF and Google

Wednesday, September 25, 2013

FrameNet LogoFrameNet, one of ICSI's longest running projects, hosted a five-day workshop September 9-13 here at our downtown Berkeley office. The workshop, which was endorsed by the Association for Computational Linguistics and sponsored by the National Science Foundation and Google, attracted developers of natural language processing applications, researchers in linguistic semantics, and lexicographers from both industry and academia.

FrameNet received its first grant in 1997. Its researchers are building a lexical knowledge base, usable by both machines and humans, that describes the relationships between words in order to extract meaning from texts. The work is based on the theory of frame semantics of Professor Charles Fillmore, who continues to direct the project.

Attendees included both those who already use FrameNet and those who are interested in adopting the FrameNet knowledge base to support work in text analysis, event tracking, and sentiment analysis. They came from across the country and around the globe to learn more about frame semantics and to practice their skills at creating frames. Four Google employees from Google's Knowledge Group attended in order to learn how FrameNet might improve the Knowledge Graph by enabling it to do deeper inference than allowed by n-gram analysis – a feature of statistical natural language processing that looks at groupings of words. Another participant, from the Korea Advanced Institute of Science and Technology, is interested in beginning a FrameNet for Korean.

"We had really good discussions among all these people," said Collin Baker, who manages the FrameNet Project. "We learned a lot about how people are using FrameNet and what they need."

He said it became apparent that two areas in particular should be improved: FrameNet should work to increase both the amount of annotated text in existing frames and the amount of full-text annotation in general. Only about a quarter of the annotation in the FrameNet Knowledge Base is full text, in which every frame-evoking word in every sentence of a text is annotated; the process is more labor intensive because it requires annotators to deal with difficult uses of lexical units - a pairing of a word and a meaning – and create new frames as they arise in the text. In contrast, lexicographic annotation, which accounts for the other 75 percent of the annotated examples in the database, starts with a word and looks for exemplar uses of it. Dipanjan Das, a Google employee and an invited speaker at the workshop, wrote an automatic semantic role labeling (ASRL) system based on learning from the FrameNet examples. His system, however, uses only the 25 percent of the data that is full-text annotation.

The workshop included lectures and demonstrations on frame semantics and FrameNet, as well as the ICSI metaphor project, MetaNet. FrameNet is a vital component of MetaNet, which seeks to build a system capable of automatically extracting and understanding metaphors used in English, Persian, Russian, and Spanish.

The workshop also included a practical hands-on session, in which participants got to practice making their own frames. Collin said, "They became more experienced with the frame creation process, which is a combination of 'armchair linguistics' and corpus linguistics." Armchair linguistics is the process of thinking about the way language and words are used; corpus linguistics involves looking at actual uses of words in a corpus. "Whenever you do corpus linguistics, you find you're right about some things and wrong others," he said. Attendees who developed frames also checked them against Web searches and the Corpus of Contemporary American English, created by Mark Davies of Brigham Young University.


Alum Nancy Chang. Download her slides.

Ten ICSI researchers and linguists, including Collin, spoke at the workshop. Guest speakers included two Google employees, one of whom, Nancy Chang, was a member of the Neural Theory of Language project at ICSI while working on her PhD. At Google, Nancy works on conversational search and natural language understanding. She spoke on September 10 about how rich event representations motivated by sensorimotor constraints can be used for language understanding and inference. She also presented work by another alum, Steve Sinha, who now works for the federal government.

Several talks highlighted how FrameNet can be used for ASRL. The ASRL system developed at Carnegie Mellon University by Google’s Dipanjan Das, SEMAFOR, uses FrameNet semantic roles and is considered the best freely available ASRL system in the world. He spoke on using data-driven ASRL models for frame elements. Another speaker, Tim Hawes from Decisive Analytics Corporations, works with DAC's own ASRL techniques as part of support systems for defense analysts.

Another alum, Josef Ruppenhofer, now at the Department of Information Science and Natural Language Processing at the University of Hildesheim, Germany, discussed his recent research on sentiment analysis. A third alum, Nathan Schneider of Carnegie Mellon University, spoke on using FrameNet to build applications.

The week ended with a talk on privacy issues and FrameNet. Gerald Friedland, who directs audio and multimedia research at ICSI, discussed how FrameNet might be used to show how the writing styles of different online accounts - say, on Yelp, Twitter, and Flickr - could be linked using FrameNet analysis, thus defeating attempts to make accounts anonymous by using different usernames.

In all, Collin says, the workshop was “very instructive” to both the organizers and the attendees. “Everyone said they enjoyed the workshop.”

Read more about the FrameNet Workshop and its speakers and sessions.