Profile: Nelson Morgan - Part 2
In 1988, ICSI ramped up to full staff, and Morgan began in his role as leader of the Realization Group. The group would focus both on building massively systems and on applications in speech recognition.
The group’s early successes were in designing and building machines powerful enough to do speech recognition. In 1989, the group designed an array of digital signal processing chips in a ring topology that used programmable gate arrays to interconnect processors. The Ring Array Processor (RAP) had a simple architecture and could be built from off-the-shelf materials. “It was way faster for what we were doing than anything you could buy for any reasonable amount of money,” said Morgan.
The RAP, as well as other hardware designed by the group, was used by ICSI’s research partners around the world. Sharing of hardware is common now, with inexpensive and universal components readily available, but it was unusual in the late 1980s and 1990s.
The group designed computer architectures (including the first single chip vector microprocessor, designed by then-student Krste Asanović), and built hardware and software. Still, Morgan said, “We had this undercurrent of speech work as being the end goal.”
By the early 1990s, Morgan’s work focused on speech recognition algorithms, rather than work on the devices to implement the algorithms. While his efforts at ICSI began with neural network approaches to speech recognition, he began to also work seriously on front-end speech processing, that is, research on the audio features that are fed to the statistical engine. “I became a real advocate of the idea that you should pay a lot more attention to the front end than automatic speech recognition researchers usually do,” he said.
In 1990, the Institute hosted the Speech Recognition Front End Workshop. There, Jordan Cohen, who would later become a frequent ICSI collaborator, presented the “Problem of the Inverse E”: if you build a system to filter out the spectrum of the sound “E” from a speech data set, a human listener can still hear the “E’s.” Morgan realized that human hearing must be sensitive to the transition between sounds so that fixed spectral changes might not eliminate the perception of speech categories. He and his colleague Hynek Hermansky figured that speech recognition systems would do well to process features relatively. They eventually developed this idea into the relative spectral processing technique (RASTA). This kind of processing helps machines handle changes in the audio spectrum. At the time, most speech systems had difficulty, for example, dealing with audio recorded on different microphones from those used to record its training data. This became particularly important later, when cell phones were ubiquitous and RASTA was designed into millions of phones.
This technique and other algorithmic developments at ICSI were used in ICSI’s Berkeley Restaurant Project, a spoken dialog system that gave restaurant recommendations. The system was unusual in that both the system and its users could direct the next step in the dialog, and the system could continue a conversation even when users did not respond directly to its questions.
Importantly, said Morgan, the work on RASTA features, as well as more recent successes, stressed the importance of front-end processing. “We woke people up to the fact that training-test spectral mismatch was a problem,” he said. “We weren’t the first people to suggest that, but we may have been the first ones to talk about it so loudly.”
RASTA is also an example of technology emulating human systems, a theme throughout much of Morgan’s work. “It’s really important to pay attention to what mechanisms we can discover from biological systems,” he said.
Starting in 1988, Morgan also collaborated with Hervé Bourlard, the Institute’s first visiting scholar, on developing the hybrid approach to speech processing. In this approach, the acoustic probabilities of Hidden Markov Models (HMMs), which have long been used in speech recognition, are determined through artificial neural networks, which comprise nodes that communicate through connectionist models. Bourlard and Morgan’s paper on the approach won a best paper award from the IEEE Signal Processing Magazine in 1996, and their work together inspired other research directions throughout the 1990s. The hybrid approach is experiencing a comeback with the growing popularity of work on deep learning.
"Working with Morgan is always fun. When you come up with a new idea, he often disagrees and argues with you," said Bourlard, who now sits on ICSI's board of trustees. "That's when you know that you may have got something interesting, and that there may also be more to it."
By the late 1990s, the Speech Group was looking for more difficult problems. Morgan said, “We were mostly looking at robustness in some sense – why are speech recognition systems breaking down? How do you make them less sensitive?”
A student suggested that Morgan, who was on his way to a meeting in Europe, keep notes about when a handheld speech recognition system, such as Siri, would have been useful. Morgan realized he needed, not a personal electronic assistant, but some easy way of recording and retrieving notes from the meeting.
“All the sudden it struck me: that’s the key application. You want to be able to have access to information from some extended meeting or meetings by querying for it,” he said.
From this idea emerged the ICSI Meeting Corpus, a collection of recorded audio from meetings held at the Institute, along with transcriptions to aid in training speech recognition systems. At the time, it was the largest corpus of transcribed meetings that was publicly available.
It was important that these recordings were of spontaneous speech. They included laughter, speech from multiple people talking at the same time, and vocalized pauses – “ums” and so forth. These elements, said Morgan, made for interesting problems in speech recognition, which the team set about solving.
While the Speech Group was looking for challenges in the late 1990s, the Institute had its own. Jerry Feldman, the Institute’s first director, was planning to step down from the position and, at around the same time, the funding agreement with Germany that had established the Institute in 1986 was about to run out. There were discussions about whether the Institute would close its doors.
“That just seemed like such a waste to me,” Morgan said. “It just felt like there was so much here that was good.”
Morgan volunteered to take over directorship of the Institute, but the financial situation was grim. “Morgan did not take this job out of ambition, but out of duty,” said Scott Shenker, leader of the Networking Group and ICSI’s chief scientist.
With the reduction in international funding, the Institute had to find industrial and U.S. federal support. A major source of revenue was AT&T, which funded a new center at ICSI focused on Internet research. Shenker helped draw the center to the Institute.
Over the next few years, ICSI had a balanced budget sheet, with funding from industrial, U.S. federal, and some international partners. The outlook got even brighter when Richard Karp, formerly the Algorithms Group leader and a Turing Award winner, returned from a four-year visit to the University of Washington.
But the dot-com bust of the early 2000s led to significant reductions in industrial funding. Since then, the Institute has come to rely mainly on federal support, particularly from the National Science Foundation and other Federal sources. Additional support comes from industry and international partners.
A major accomplishment was the establishment of a new German visiting agreement. The Institute’s original ten-year agreement with the German Federal Laboratory for Computer Science expired in 1999. Morgan negotiated new agreements with German organized through the German Academic Exchange Service. In its most recent form,the agreement supports the hosting of about ten postdoctoral fellows every year from Germany. The Institute currently also has agreements with Finland and Singapore. Morgan was instrumental in all of them. Under Morgan’s leadership, the Institute also recently received several large federal U.S. grants.
During nearly all of this time, since 1992, Morgan has also held a faculty position in the Electrical Engineering and Computer Science Department. With Morgan as an advisor, 20 PhDs have graduated from Berkeley during this time.
Earlier this year, Morgan stepped down from the position as director and now serves as deputy director. He will continue, as he has done since the Institute’s foundation, to lead speech research efforts, and his focus is gradually switching back to research.
“Morgan’s first love is research, but he sacrificed the pursuit of his own intellectual agenda in order to provide financial stability for the rest of us,” said Shenker. “He did so with quiet grace and relentless energy, and we are all in his debt.”
Add new comment