ICSI BEARS Open House 2015
Thursday, February 12, 2015
12:45 - 5:00 p.m.
The ICSI annual open house, held in conjunction with the UC Berkeley EECS Department's Berkeley EECS Annual Research Symposium (BEARS), includes research demonstrations, talks, and posters as well as time to talk with our computer scientists about their work. A shuttle will be available to bring people from the symposium to ICSI.
RSVP to [email protected] if you plan to attend the open house.
Schedule:
12:45 - 1:30 p.m. Posters and Demonstrations
1:30 - 2:15 p.m. SDN: Looking Backwards, Moving Forwards by Scott Shenker, Chief Scientist and Director of Research Initiatives
2:15 - 3:00 p.m. Network Analyses of Brain Disorders by Eric Friedman, Senior Researcher, Research Initiatives
3:00 - 5:00 p.m. Posters and Demonstrations
Demonstrations
"Multilingual FrameNets: Semantics of Events and Beyond" by Collin Baker and the FrameNet Project
"Unsupervised Mining of Visually Consistent Sounds" by Achal Dave, Benjamin Elizalde, Stella X. Yu, Alexei A. Efros
"The Teaching Privacy Project" by Serge Egelman, Jaeyoung Choi, and Julia Bernd
"Generating Natural Language Description for Images and Video" by Marcus Rohrbach, Vision
"Spreadsheet Composition for Collaborative Data Analysis" by Michele Stecca, Research Initiatives
"Domain-Specific Applications of Embodied Construction Grammar" by Sean Trott, Artificial Intelligence
"Visual Typo Correction by Collocative Optimization - A Case Study on Merchandise Images" by Xiao-Yong Wei, Audio and Multimedia
Posters
"Towards a Multimedia Genome: A New Dataset and New Challenges" by Julia Bernd, Damian Borth, Jaeyoung Choi, Benjamin Elizalde
"Multimedia Opinion Mining with Visual Sentiment Analysis" by Damian Borth, Audio and Multimedia (Co-authors: Gerald Friedland and Trevor Darrell)
"Robust CNN-Based Speech Recognition with Gabor Filter Kernel" by Shuo-Yiin Chang
"Representing Caused Motion in Embodied Construction Grammar" by Ellen K. Dodge and Miriam R L Petruck, Artificial Intelligence
"Measuring Security Behaviors and Attitudes" by Serge Egelman, Networking and Security
"What and Where in the Image Makes a Scary Dog Scene Scary? — Spatial and Featural Localization of Image Sentiment Analysis" by Jiashi Feng, Damian Borth, Stella X. Yu, and Trevor Darrell
"Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling" by Michael Maire, Stella X. Yu, and Pietro Perona
"Learning Lightness from Human Judgement on Relative Reflectance" by Takuya Narihira, Michael Maire, and Stella X. Yu
"Hybrid MLP/Structured-SVM Tandem Systems for Large Vocabulary and Robust ASR" by Suman Ravuri
"Neural Network Models for Lexical Addressee Detection" by Suman Ravuri and Andreas Stolcke
"Netalyzr: The Android Root Certificate Store" by Narseo Vallina Rodriguez, Networking and Security (Co-authors: J. Amann, C. Kreibich, N. Weaver, V. Paxson)
"Syllable Based Keyword Search: Transducing Syllable Lattices to Word Lattices" by Hang Su, Speech
"Information Flow Experiments: Google’s Use of Data" by Michael Carl Tschantz, Amit Datta, Anupam Datta, and Jeannette M. Wing
Talks
1:30 - 2:15 p.m.
SDN: Looking Backwards, Moving Forwards by Scott Shenker, Chief Scientist and Director of Research Initiatives
Software-Defined Networking (SDN) was developed over six years ago, amidst much hope and naivete. In this talk, I will first discuss SDN's past, highlighting the follies of our youth. I will then discuss how a suitably updated SDN paradigm can meet the challenges of the future.
back to top
2:15 - 3:00 p.m.
Network Analyses of Brain Disorders by Eric Friedman, Senior Researcher, Research Initiatives
The study of the human connectome, or brain network, has become an important tool in neuroscience and the study of brain disorders. In this talk, I will describe the construction, analysis, and applications of this network approach, covering both the basic theory of networks as well as the applications to brain disorders, such as Alzheimer’s Disease and Agenesis of the Corpus Callosum (a birth defect).
back to top
Demonstrations
Multilingual FrameNets: Semantics of Events and Beyond by Collin Baker and the FrameNet Project
The FrameNet Project at ICSI is the center of a multinational effort to build lexical databases and annotated corpora in more than a dozen languages, based on Fillmore's Frame Semantics. We present an introduction to the project, examples of commercial and research uses, and highlight current research that goes beyond frames for events.
back to top
Unsupervised Mining of Visually Consistent Sounds by Achal Dave, Benjamin Elizalde, Stella X. Yu, Alexei A. Efros
Media sharing sites on the Internet and the one-click upload capability of smartphones have led to a deluge of online multimedia content -- and an ever-growing demand for methods to make them easier to retrieve, search, and index. We seek to discover consistencies in audio and video in a completely unsupervised manner. This is a particularly challenging task: visual elements are often silent, and audio sources are often not present in images. Our approach takes inspiration from discriminative clustering methods, using detections from an off the shelf scene and object detector as weak labels in clustering audio segments. The approach is analyzed and evaluated in the context of audio concept retrieval on the YLI-MED dataset, an annotated subset of the YFCC100M corpus.
back to top
The Teaching Privacy Project by Serge Egelman, Jaeyoung Choi, and Julia Bernd
The Teaching Privacy project aims to empower K-12 and college students in making informed choices about privacy, by building a set of educational tools and hands-on exercises to help teachers demonstrate what happens to personal information on the Internet -- and what the effects of sharing information can be. This demo will highlight some classroom-ready teaching materials and educational tools currently under development as part of TROPE (Teachers’ Resources for Online Privacy Education), available at teachingprivacy.org. In particular, our new educational app Post and Post Alike helps students visualize what type of information they share with whom on Facebook, and how their sharing behavior compares with that of others.
back to top
Generating Natural Language Description for Images and Video by Marcus Rohrbach, Vision
Humans use rich natural language to describe and communicate visual perceptions. In order to automatically generate natural language for images and videos we present two contributions.
First, we develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are “doubly deep” in that they can be compositional in spatial and temporal “layers”.
Second, we collected two datasets which enable learning of the video description task on multiple sentences and/or at variable level of detail. The first is a cooking dataset which require fine-grained distinction and description. The second contains Hollywood movies with aligned movie scripts and DVS (audio descriptions for the blind) with the ultimate goal of generating descriptions for blind people.
back to top
Spreadsheet Composition for Collaborative Data Analysis by Michele Stecca, Research Initiatives
We describe a novel interaction paradigm based on Spreadsheet Composition, i.e., in a few words, a paradigm in which the users interact over a Spreadsheet Space, i.e., a virtual space for spreadsheets. In the same way as the World Wide Web initially extended the hypertext concept over the Internet, the SpreadSheet Space extends the natural linked structure of spreadsheets over the Internet. This demo will show how Spreadsheet Composition can be used to link spreadsheets under a peer to peer relationship, to distribute information, to collect information, to combine information, to access information exposed by corporate software platforms and ERPs as well as to browse Open Data and Big Data reports.
back to top
Domain-Specific Applications of Embodied Construction Grammar by Sean Trott, Artificial Intelligence
Read more about this demo on our blog.
Embodied Construction Grammar (ECG) offers a neurologically plausible model of language use. Like other construction grammars, ECG operates on the assumption that language is composed of form-meaning pairings; in ECG, constructions bind grammatical constituents to roles in complex schema hierarchies, and the ECG Analyzer parses input utterances and outputs a Semantic Specification (SemSpec) illustrating these pairings. ECG focuses particularly on simulation-based semantics; thus, in this talk, I will focus on two distinct domains to which we have applied ECG. First, I will describe a robot task, in which natural language is used to give commands to a simulated robot model; this is a modular system that demonstrates how ECG can be used to drive action from language. Second, I will describe an implementation for metaphor analysis, in which the system uses ECG for metaphor identification and analysis, and is able to construct a database of utterances and associated metaphor bindings. Both systems are implemented and fully functional.
back to top
Visual Typo Correction by Collocative Optimization - A Case Study on Merchandise Images by Xiao-Yong Wei, Audio and Multimedia
This is a domain specific study on Near-duplicate retrieval (NDR) for merchandise images. NDR in merchandize images is of great importance to a lot of online applications on e-Commerce websites. However, in those applications where the requirement of response time is critical, the conventional techniques developed for a general purpose NDR is limited, because expensive post-processing like spatial verification or hashing is usually employed to compromise the quantization errors among the visual words used for the images. We argue that most of the errors are introduced because of the quantization process where the visual words are considered individually, which has ignored the contextual relations among words. We propose a "spelling or phrase correction" liked process for NDR which extends the concept of collocations to visual domain for modeling the contextual relations, and uses the binary quadratic programming (BQP)to enforce the words selected for an image are contextually consistent to each other, so that the errors (typos) are eliminated and the quality of the quantization process is improved. The experimental results show that the proposed method can improve the efficiency of NDR by reducing vocabulary size by 1,000% times, and under the scenario of merchandize image NDR, the expensive local interest point feature used in conventional approaches can be replaced by color-moment which reduces the time cost by 9,202% while remaining comparable performance to the state-of-the-art methods.
back to top
Posters
Towards a Multimedia Genome: A New Dataset and New Challenges by Julia Bernd, Damian Borth, Jaeyoung Choi, Benjamin Elizalde
What the Human Genome Project and the Music Genome Project have accomplished for their realms of big data, the Multimedia Genome Project hopes to do for consumer-produced images and videos: foster a fundamental understanding of the major underlying structures by analyzing the building blocks. The MMGP centers around the Yahoo Flickr Creative Commons 100 Million dataset (YFCC100M), containing 99.2 million images and 800,000 videos. YFCC100M is the basis for the YLI Corpus of audio, visual, and motion feature representations for the media; YLI-Geo, a sub corpus used to develop automatic geo-location estimation systems; YLI-MED, an index of videos labeled for the events they depict; and audioCaffe, a set of tools for analyzing audio data.
back to top
"Multimedia Opinion Mining with Visual Sentiment Analysis" by Damian Borth, Audio and Multimedia (Co-authors: Gerald Friedland and Trevor Darrell)
The use of images and videos in social media is increasing and gives rise to a new type of content called "social multimedia." Such content conveys much about our thinking, in that it reflects our personal values and ourselves as a society. To make this content accessible, we present SentiBank, a framework for visual sentiment analysis consisting of 1,500 adjective-noun pair (ANP) classifiers trained from Caffe's deep learning features. Further, we would like to show applications around its usage for social good or brand monitoring.
back to top
Robust CNN-Based Speech Recognition with Gabor Filter Kernel by Shuo-Yiin Chang
As has been extensively shown, acoustic features for speech recognition can be learned from neural networks with multiple hidden layers. However, the learned transformations may not sufficiently generalize to test sets that have a significant mismatch to the training data. Gabor features, on the other hand, are generated from spectro-temporal filters designed to model human auditory processing. In previous work, these features are used as inputs to neural networks, which improved word accuracy for speech recognition in the presence of noise. Here we propose a neural network architecture called a Gabor Convolutional Neural Network (GCNN) that incorporates Gabor functions into convolutional filter kernels. In this architecture, a variety of Gabor features served as the multiple feature maps of the convolutional layer. The filter coefficients are further tuned by back-propagation training. Experiments used two noisy versions of the WSJ corpus: Aurora 4, and RATS re-noised WSJ. In both cases, the proposed architecture performs better than other noise-robust features that we have tried, namely, ETSI-AFE, PNCC, Gabor features without the CNN-based approach, and our best neural network features that don’t incorporate Gabor functions.
back to top
Representing Caused Motion in Embodied Construction Grammar by Ellen K. Dodge and Miriam R L Petruck, Artificial Intelligence
This work offers an Embodied Construction Grammar (Feldman et al. 2010) representation of caused motion, thereby also providing (a sample of) the computational infrastructure for implementing the information that FrameNet has characterized as Caused_motion1 (Ruppenhofer et al. 2010). This work specifies the semantic structure of caused motion in natural language, using an Embodied Construction Grammar analyzer that includes the semantic parsing of linguistically instantiated constructions. Results from this type of analysis can serve as the input to NLP applications that require rich semantic representations.
back to top
Measuring Security Behaviors and Attitudes by Serge Egelman, Networking and Security
Using methods from cognitive psychology, we constructed a scale to measure end-users' attitudes towards recommended security practices. We performed a series of experiments to show that the scale exhibits internal reliability, high variance, and discriminant validity (desirable psychometric properties). Finally, we show how many security attitudes can be predicted using well-studied psychometrics from the psychology literature (e.g., decision-making style, risk-taking attitudes, etc.).
back to top
What and Where in the Image Makes a Scary Dog Scene Scary? — Spatial and Featural Localization of Image Sentiment Analysis by Jiashi Feng, Damian Borth, Stella X. Yu, and Trevor Darrell
An image of a pouncing dog conveys visual content and carries emotional charge to a human viewer. Our goal is to develop a computer vision system that identifies the connection between the two automatically: What features and where in the image make a scary dog scene scary? We annotate an image set of objects and scenes with specific adjective-noun pair (ANP) phrases and develop a bi-selection framework for locating the adjective in both the image and the visual feature space. We look into both the hand-crafted low-level features (SentiBank) and deep learning features (CAFFE) to find features and locations most responsible for a specific sentiment.
back to top
Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling by Michael Maire, Stella X. Yu, and Pietro Perona
We frame the task of predicting a semantic labeling as a sparse reconstruction procedure that applies a target-specific learned transfer classifier to a generic deep sparse code representation of an image. Our classifier utilizes this deep representation in a novel manner: rather than acting on nodes in the deepest layer, it attaches to nodes along a slice through multiple layers of the network in order to make predictions about local patches. We demonstrate performance competitive with state-of-the-art contour detection systems and promising initial results on semantic part labeling of human faces, without any form of hand-designed features or filters.
back to top
Hybrid MLP/Structured-SVM Tandem Systems for Large Vocabulary and Robust ASR by Suman Ravuri
Neural Network Models for Lexical Addressee Detection by Suman Ravuri and Andreas Stolcke
Learning Lightness from Human Judgement on Relative Reflectance by Takuya Narihira, Michael Maire, and Stella X. Yu
We develop a new approach to recovering lightness, the perceived reflectance of surfaces, from a single image. Recovering a lightness map can be viewed as a precursor to the task of intrinsic image decomposition, which separates an image into reflectance and shading components. While existing methods typically reason about reflectance and shading together, we learn to directly predict reflectance differences between pairs of image patches. With large-scale training on an expressive patch representation, we learn a pairwise comparison model and then convert the output to a global reflectance ranking. Our direct prediction strategy yields performance superior to that of the state-of-the-art decomposition techniques when evaluating reflectance judgments against ground-truth on the Intrinsic Images in the Wild dataset.
back to top
Netalyzr: The Android Root Certificate Store by Narseo Vallina Rodriguez, Networking and Security (Co-authors: J. Amann, C. Kreibich, N. Weaver, V. Paxson)
The security of today’s Web rests in part on the set of X.509 certificate authorities trusted by each user’s browser. Users generally do not themselves configure their browser’s root store but instead rely upon decisions made by the suppliers of either the browsers or the devices upon which they run. In this work we explore the nature and implications of these trust decisions for Android users. Drawing upon datasets collected by Netalyzr for Android and ICSI’s Certificate Notary, we characterize the certificate root store population present in mobile devices in the wild. Motivated by concerns that bloated root stores increase the attack surface of mobile users, we report on the interplay of certificate sets deployed by the device manufacturers, mobile operators, and the Android OS. We identify certificates installed exclusively by apps on rooted devices, thus breaking the audited and supervised root store model, and also discover use of TLS interception via HTTPS proxies employed by a market research company.
back to top
Syllable Based Keyword Search: Transducing Syllable Lattices to Word Lattices by Hang Su, Networking and Security
This work presents a weighted finite state transducer (WFST) based syllable decoding and transduction framework for keyword search (KWS). It uses syllable for speech recognition and keyword search and tries to handle in-vocabulary (IV) and out-of-vocabulary (OOV) words together. We show that our method can effectively perform KWS on both IV and OOV keywords, and yields up to 0.03 Actual Term-Weighted Value (ATWV) improvement over searching keywords directly in subword lattices. Word Error Rates (WER) and KWS results are reported for three different languages.
back to top
Information Flow Experiments: Google’s Use of Data by Michael Carl Tschantz, Amit Datta, Anupam Datta, and Jeannette M. Wing
To partly address people's concerns over web tracking, Google has created the Ad Settings webpage to provide information about and some choice over the profiles Google creates on users. We present AdFisher, an automated tool that explores how user behaviors, Google's ads, and Ad Settings interact. AdFisher can run browser-based experiments and analyze data using machine learning and significance tests. Our tool uses a rigorous experimental design and statistical analysis to conduct information flow experiments. We prove that these experiments can identify flows of information formalized in terms of both security properties and causation. We use AdFisher to find that the Ad Settings are opaque about some features of a user's profile, that it does provide some choice on ads, and that these choices can lead to seemingly discriminatory ads. In particular, we found that visiting webpages associated with substance abuse will change the ads shown but not the settings page. We also found that setting the gender to female results in getting fewer instances of an ad related to high paying jobs than setting it to male. We cannot determine who caused these findings due to our limited visibility into the ad ecosystem, which includes Google, advertisers, websites, and users. Nevertheless, these results can form the starting point for deeper investigations by either the companies themselves or by regulatory bodies.
back to top