Dominic Massaro's research benefits school for deaf: 01-19-98

[Currents header graphic]

January 19, 1998

UCSC psychologist teams up with Oregon school to help deaf children

Dominic Massaro (left) with deaf child

Speech is the birthright of every child. It is the deaf child's one fair chance to keep in touch with his fellows.
--Helen Keller

After years of working in a dark, windowless laboratory to understand speech perception and how speech can be communicated by machines, UCSC psychology professor Dominic Massaro is realizing his long-held dream of using advanced technology to help hearing-impaired youngsters learn to speak.

Massaro and his former student and research associate Michael Cohen have been at the forefront of the development of synthetic speech for years. Their latest creation, a three-dimensional computerized talking head nicknamed "Baldi," is now being used by deaf children and their teachers at the Tucker-Maxon Oral School in Portland, Oregon.

The image on the computer screen resembles an animated mannequin, with moving eyes, brows, and mouth. When Massaro types in text, Baldi "talks." When Massaro speaks, Baldi "listens" and responds. An underlying grid allows researchers to manipulate the jaw, lips, and tongue to mimic human speech.

"It's like the face is a puppet and we've got about 60 strings we're controlling it with," said Massaro. Texture mapping allows Massaro and Cohen to wrap any still video picture over the framework to produce a more natural or familiar image.

The value of animated synthetic speech is that it provides the visual cues that are a critical part of speech comprehension. The same principle is at work when the hearing impaired read lips to follow conversations.

Massaro and Cohen's technology gives students at Tucker-Maxon the opportunity to observe closely the facial movements that are used in producing spoken words, and even to strip away the "skin" of the face and study the wire framework that mimics the speech organs underlying the production of human speech. In its transparent form, Baldi shows students the precise position of the tongue during the formation of sounds and words they've never spoken. A half-sagittal view reveals the movement of the lips, lower jaw, and tongue.

"Hearing individuals take speech for granted, but for hearing-impaired and profoundly deaf children, being able to watch how we produce words is a valuable tool for developing those skills," said Massaro.

For 40 years, Tucker-Maxon has been teaching profoundly deaf children to speak. All 55 students at the school have powerful hearing aids or cochlear implants. The school works closely with students and their families, beginning as early as the preschool level. Tucker-Maxon offers a standard elementary school curriculum and strives to help students transfer as soon as they're ready to a school for hearing children.

Teachers and administrators at Tucker-Maxon introduced students to Baldi this past fall and are very pleased so far. Eleven students aged 9 to 12 are using the four workstations, said Tucker-Maxon executive director Pat Stone.

"The kids love it," said Stone. "They really like Baldi, and they work hard at it. I think it's going to have tremendous potential as a teaching tool. It's like getting a really good new teacher's aide in the classroom."

The workstations greatly expand the children's opportunities to practice speech and will be an important tool for building vocabulary, said Stone. "The key is that the kids need good feedback, and the computer can give them really good feedback," he noted.

Massaro and Cohen's innovations will improve the speed with which children acquire speech as well as the quality of their speech, said Stone. "The more opportunities the kids have to practice, the faster they are going to learn," he said. "At the same time, controlled feedback will improve their accuracy. The two go hand-in-hand."

Baldi is particularly appealing for children because he can be programmed to speak in any voice, including the teacher or student's voice, and his image can be substituted with any face--the child's, the teacher's, even a favorite celebrity or sports hero.

"The ear can instruct the tongue really well," said Massaro. "Blind children learn to speak quite well, because somehow the ear gives us the information we need to articulate language appropriately. It remains to be seen how well we can instruct deaf children with visual information."

UCSC is one of three institutions that are teaming up to help the students of Tucker-Maxon. The Center for Spoken Language Understanding at the Oregon Graduate Institute of Science and Technology is supplying a speech toolkit that generates synthetic speech and recognizes spoken words--providing the auditory complement to Massaro's visual technology. Carnegie Mellon University is using video cameras to track the faces of children and monitor their visible speech. Ultimately, the tapes will offer valuable feedback on each child's speech mechanics.

Through a program that supports the discovery and application of new technologies to "real-world" problems, the National Science Foundation has provided approximately $600,000 per year to fund the three-year collaboration, and Intel Corporation has donated to the project five 266MHz Pentium® II processor-based systems with MMXª technology and Intel's new Create & Shareª Camera Pack, which enables online sharing of photos and videos as well as access to the Intel Video Phone. The project is one of seven selected by the NSF for funding from a pool of approximately 200 applicants who submitted proposals.

Massaro, a cognitive psychologist, has spent years probing the mystery of speech perception and comprehension. His lab is one of only a handful of facilities around the world that are using facial animation in the quest for understanding. Years of pure research have fostered the applied work now taking place at Tucker-Maxon. His new book, Perceiving Talking Faces: From Speech Perception to a Behavioral Principle (Cambridge, MA: MIT Press, 1998), details his extensive series of experiments on the use of bimodal cues in speech perception; an accompanying CD-ROM allows the reader to explore the phenomena directly.

To the Currents home page

To UCSC's home page