“Beyond Words”: Brief Overview of Two SLTs That Look Further than the Word Llevel

Matha Larson

Delft University of Technology, Netherlands, & ICSI

Tuesday, June 9, 2015
12:30 p.m., Conference Room 5A

This talk provides a succinct look at two speech and language technologies that have been investigated at my home base: the Intelligent Systems Department at Delft University of Technology. They were chosen not only to represent recent work, but also for their potential interest to the audience.

The first is a RNN Language Model that is used for the task of re-scoring ASR hypotheses. I present examples of different sorts of meta-information that can be incorporated into the RNNLM, including token-size, sentence-length, and topic. As with conventional language models, integrating information yields improvement: the main message is that we not forget its potential in this day and age of deep learning.

The second is a technology that predicts the real-world size of objects depicted in images using the set of tags that users assign to those images. The approach works by combining web-scale text statistics with implicit information about semantic frames. The main message is that it is not necessary for users to tag all properties of depicted objects in images explicitly. Rather, the semantic structure of frequently occurring lexical items allows us to “read between the tags” to infer characteristics of image contents.

Bio:

Martha Larson works in the area of multimedia information retrieval, recommender systems, and crowdsourcing in the Multimedia Computing group at Delft University of Technology, Netherlands. Previously, she researched and lectured in the area of audio-visual retrieval at Fraunhofer IAIS and at the University of Amsterdam. She is a co-organizer of the MediaEval multimedia benchmark campaign, scientific coordinator of the European project CrowdRec, and is currently Associate Editor of IEEE Transactions on Multimedia. Her teaching activities include an undergraduate course that she designed in the areas of multimedia analysis (insider info: uses Friedland & Jain 2014 as a text). For this talk, she returns to her primal passion: speech and language.