Featured Researcher: Nelson Morgan
Back to ICSI Gazette, Summer 2013
Morgan has led speech research at ICSI since the Institute’s inauguration in 1988. Morgan also served as director for thirteen years starting in 1999, the year the agreement that had established ICSI expired. Morgan volunteered for the challenge of broadening and stabilizing the Institute’s funding base and, by the end of his tenure, left the Institute in better financial condition than it had experienced in years. Morgan has always enjoyed a challenge.
Early Career
Morgan was born and raised in Buffalo, New York. Fascinated by electronics, he was “always hooking things up to other things,” he said. When he was 10, he built a simple noise recognizer for Halloween by connecting a microphone to a circuit board he had bought from Popular Electronics. When the microphone detected sound, a tape player began playing Godzilla music to scare trick-or-treaters.
Morgan’s older brother had a strong influence on his early life and interests: he introduced Morgan to rock-and-roll, which sparked a lifelong interest in sound and audio, and gave him his first tape recorder, a rare technology in the 1950s. Morgan used it to record televised news reports, which he edited – using a razor blade – to make them sound more leftist. Morgan’s brother also introduced him to the college programs that allowed him to enter college after two years of high school. At the age of 16, Morgan entered the University of Chicago as a physics major.
But he decided to leave the university in order to do “a lot of wandering,” at one point living in a teepee in the woods. He eventually began managing and recording rock bands and working as a technical advisor on motion pictures, including Godfather Part II.
Most of his work consisted of mixing audio for television commercials. One day, while adjusting the volume of a dog’s bark for a commercial, he said, “I realized this was not quite the creative experience I was expecting.”
The Science of Sound
He thought he might enjoy the audio-related career he had wandered into if he better understood the science behind it, so he began taking classes part time while continuing to work as a sound technician. One summer, he took an introductory course in electronics at UC Berkeley. “It was wonderful,” he said. “It was much more exciting than anything I was doing in the studio.”
His undergraduate advisor at Berkeley suggested he apply for a National Science Foundation fellowship, which would allow him to attend school full time as a graduate student. “I decided I would write up exactly what I wanted to do – just what I wanted to do – for my research,” he said. “And if they said they’d pay for it, great, I’d be a student. And if not, I would continue doing what I was doing.”
His fellowship application described a project to create sound effects electronically. At the time, technicians commonly simulated room reverberation with a metal plate in order to produce sound effects for movies. “These were pretty hokey-sounding, and you couldn’t adjust them for a particular room size,” he said.
He was awarded a three-year graduate fellowship from NSF, and he decided he would try to finish his doctorate before the funding ran out. Doctorates in physical sciences or engineering often take five years or more, but Morgan said he had an advantage: “From the first day I knew exactly what my research would be.”
Although his research was on room acoustics, he spent some of his spare time chatting about technical topics with Ben Gold, a pioneer in digital signal processing and then a visiting professor at UC Berkeley. Later, in the early 1990s, Gold and Morgan established a class at Berkeley that combined their varied experiences with speech processing. The class has been taught every other year since then, and Morgan and Gold developed the class outlines into a textbook, Speech and Audio Signal Processing, which was recently revised and released in a second edition with the help of Columbia Professor and ICSI alum Dan Ellis.
Neural Networks
As he approached graduation, Morgan was offered a position by Dolby to start a digital audio laboratory. The recession of the late 1970s, however, forced Dolby to lay off much of its work force, and Morgan’s offer was canceled at the last moment. He quickly found a position at National Semiconductor, where he worked on speech analysis and synthesis techniques.
In one project at the lab, short recordings of actors were used to synthesize longer pieces of audio for commercials and products such as talking soda machines. This required that the recordings be divided into voiced speech, which is produced by vibrations of the vocal cords, and unvoiced speech, which is produced from air moving past some obstruction in the vocal tract such as the teeth. To explore methods of separating these automatically, Morgan bought a book on pattern recognition and coded all the techniques in it. Neural networks, which Morgan would later use extensively in speech recognition, happened to work the best in his experiments. “We cut the time enormously by just having experts do fine-tuning on a smaller set and training the classifier from the hand-labeled data,” he said.
His next experience with neural networks was at EEG Labs, which he joined in 1984. Researchers at the lab were using scans of the brain in order to understand its performance of cognitive functions. It was a new experience for Morgan, whose work until then had been mostly in signal processing. “I learned a lot from them, not just about the brain, but also about pattern recognition and neural networks,” Morgan said. “That’s really where I learned about them.”
In 1986, a new computer science laboratory was incorporated in Berkeley, and word got around that it needed researchers. After a conversation with then-Director Jerry Feldman, Morgan was chosen to lead a group at ICSI that would focus on building massively parallel computers.
“But I didn’t want to be just building up systems to do what someone else was interested in,” he said. He decided the group’s work would be applied to problems in speech research. In September 1988, when the Institute was officially inaugurated, he became the leader of the Realization Group, renamed the Speech Group in 1999.
The Realization Group
The group’s early successes were in designing and building machines powerful enough to do speech recognition. In 1989, the group designed an array of digital signal processing chips in a ring topology that used programmable gate arrays to interconnect processors. The Ring Array Processor (RAP) had a simple architecture and could be built from off-the-shelf materials. “It was way faster for what we were doing than anything you could buy for any reasonable amount of money,” said Morgan.
The RAP, as well as other hardware designed by the group, was used by ICSI’s research partners around the world. Sharing of hardware is common now, with inexpensive and universal components readily available, but it was unusual in the late 1980s and 1990s.
The group designed computer architectures (including the first single chip vector microprocessor, designed by then-student Krste Asanovic), and built hardware and software. Still, Morgan said, “We had this undercurrent of speech work as being the end goal.”
Front-End Processing
By the early 1990s, Morgan’s work focused on speech recognition algorithms, rather than on the devices to implement the algorithms. While his efforts at ICSI began with neural network approaches to speech recognition, he began to also work seriously on front-end speech processing: processing of the audio features that are fed to the statistical engine. “I became a real advocate of the idea that you should pay a lot more attention to the front end than automatic speech recognition researchers usually do,” he said.
In 1990, the Institute hosted the Speech Recognition Front End Workshop. There, Jordan Cohen, who would later become a frequent ICSI collaborator, presented the “Problem of the Inverse E”: if you build a system to filter out the spectrum of the sound “E” from a speech data set, a human listener can still hear the “E’s.” Morgan realized that human hearing must be sensitive to the transition between sounds so that fixed spectral changes might not eliminate the perception of speech categories. He and his colleague Hynek Hermansky figured that speech recognition systems would do well to process features relatively. They eventually developed this idea into the relative spectral processing technique (RASTA). This kind of processing helps machines handle changes in the audio spectrum. At the time, most speech systems had difficulty, for example, dealing with audio recorded on different microphones from those used to record its training data. This became particularly important later, when cell phones were ubiquitous - RASTA was designed into millions of phones.
This technique and other algorithmic developments at ICSI were used in ICSI’s Berkeley Restaurant Project, a spoken dialog system that gave restaurant recommendations. The system was unusual in that both the system and its users could direct the next step in the dialog, and the system could continue a conversation even when users did not respond directly to its questions.
Importantly, said Morgan, the work on RASTA features, as well as more recent successes, stressed the importance of front-end processing. “We woke people up to the fact that training-test spectral mismatch was a problem,” he said. “We weren’t the first people to suggest that, but we may have been the first ones to talk about it so loudly.”
RASTA is also an example of technology emulating human systems, a theme throughout much of Morgan’s work. “It’s really important to pay attention to what mechanisms we can discover from biological systems,” he said.
Starting in 1988, Morgan also collaborated with Hervé Bourlard, the Institute’s first visiting scholar, on developing the hybrid approach to speech processing. In this approach, the acoustic probabilities of hidden Markov models (HMMs), which have long been used in speech recognition, are determined through artificial neural networks, which comprise nodes that communicate through connectionist models. Bourlard and Morgan’s paper on the approach won a best paper award from the IEEE Signal Processing Magazine in 1996, and their work together inspired other research directions throughout the 1990s. The hybrid approach is experiencing a comeback with the growing popularity of work on deep learning.
“Working with Morgan is always fun. When you come up with a new idea, he often disagrees and argues with you,” said Bourlard, who now sits on ICSI’s board of trustees and leads Idiap in Martigny, Switzerland. “That’s when you know that you may have got something interesting, and that there may also be more to it.”
ICSI Meeting Corpus
By the late 1990s, the Speech Group was looking for more difficult problems. Morgan said, “We were mostly looking at robustness in some sense – why are speech recognition systems breaking down? How do you make them less sensitive?”
A student suggested that Morgan, who was on his way to a meeting in Europe, keep notes about when a handheld speech recognition system, such as Siri, would have been useful. Morgan realized he needed, not a personal electronic assistant, but some easy way of recording and retrieving notes from the meeting.
“All the sudden it struck me: that’s the key application. You want to be able to have access to information from some extended meeting or meetings by querying for it,” he said.
From this idea emerged the ICSI Meeting Corpus, a collection of recorded audio from meetings held at the Institute, along with transcriptions to aid in training speech recognition systems. At the time, it was the largest corpus of publicly available transcribed meetings.
It was important that these recordings were of spontaneous speech. They included laughter, speech from multiple people talking at the same time, and vocalized pauses – “ums” and so forth. These elements, said Morgan, made for interesting problems in speech recognition, which the team set about solving.
The Next Challenge
While the Speech Group was looking for challenges in the late 1990s, the Institute had its own. Jerry Feldman, the Institute’s first director, was planning to step down from the position and, at around the same time, the funding agreement with Germany that had established the Institute in 1986 was about to run out. There were discussions about whether the Institute would close its doors.
“That just seemed like such a waste to me,” Morgan said. “It just felt like there was so much here that was good.”
Morgan volunteered to take over directorship of the Institute, but the financial situation was grim. “Morgan did not take this job out of ambition, but out of duty,” said Scott Shenker, director of Research Initiatives and ICSI’s chief scientist.
With the reduction in international funding, the Institute had to find industrial and U.S. federal support. A major source of revenue was AT&T, which funded a new center at ICSI focused on Internet research. Shenker helped draw the center to the Institute.
Over the next few years, ICSI had a balanced budget sheet, with funding from industrial, U.S. federal, and some international partners. The outlook got even brighter when Richard Karp, formerly the Algorithms Group leader and a Turing Award winner, returned from a four-year visit to the University of Washington.
But the dot-com bust of the early 2000s led to significant reductions in industrial funding. Since then, the Institute has come to rely mainly on federal support, particularly from the National Science Foundation and other Federal sources. Additional support comes from industry and international partners.
A major accomplishment was the establishment of a new German visiting agreement. The Institute’s original ten-year agreement with the German Federal Laboratory for Computer Science expired in 1999. Morgan negotiated new agreements with Germany organized through the German Academic Exchange Service. In its most recent form, the agreement supports the hosting of about ten postdoctoral fellows every year from Germany. The Institute currently also has agreements with Finland and Singapore. Morgan was instrumental in all of them. Under Morgan’s leadership, the Institute also recently received several large federal U.S. grants.
Since 1992, Morgan has also held a faculty position in the Electrical Engineering and Computer Science Department. He has advised 20 doctoral students in that time.
Last year, Morgan stepped down from the position as director and now serves as deputy director. He will continue, as he has done since the Institute’s foundation, to work on topics in speech, and his focus is gradually switching back to research.
“Morgan’s first love is research, but he sacrificed the pursuit of his own intellectual agenda in order to provide financial stability for the rest of us,” said Shenker. “He did so with quiet grace and relentless energy, and we are all in his debt.”