Our brains have a remarkable knack for picking out individual voices in a noisy environment, like a crowded coffee shop or a busy city street. This is something that even the most advanced hearing aids struggle to do. But now Columbia engineers are announcing an experimental technology that mimics the brain’s natural aptitude for detecting and amplifying any one voice from many. Powered by artificial intelligence, this brain-controlled hearing aid acts as an automatic filter, monitoring wearers’ brain waves and boosting the voice they want to focus on.
Though still in early stages of development, the technology is a significant step toward better hearing aids that would enable wearers to converse with the people around them seamlessly and efficiently. This achievement is described in Science Advances and a press release describing the findings is posted on the Columbia Zuckerman Institute website.
“The brain area that processes sound is extraordinarily sensitive and powerful; it can amplify one voice over others, seemingly effortlessly, while today’s hearings aids still pale in comparison,” said Nima Mesgarani, PhD, a principal investigator at Columbia’s Mortimer B. Zuckerman Mind Brain Behavior Institute and the paper’s senior author. “By creating a device that harnesses the power of the brain itself, we hope our work will lead to technological improvements that enable the hundreds of millions of hearing-impaired people worldwide to communicate just as easily as their friends and family do.”
Modern hearing aids are excellent at amplifying speech while suppressing certain types of background noise, such as traffic. But they struggle to boost the volume of an individual voice over others. Scientists calls this the cocktail party problem, named after the cacophony of voices that blend together during loud parties.
“In crowded places, like parties, hearing aids tend to amplify all speakers at once,” said Mesgarani, who is also an associate professor of electrical engineering at Columbia Engineering. “This severely hinders a wearer’s ability to converse effectively, essentially isolating them from the people around them.”
The Columbia team’s brain-controlled hearing aid is different. Instead of relying solely on external sound-amplifiers, like microphones, it also monitors the listener’s own brain waves.
This technology works by mimicking what the brain would normally do. First, the device automatically separates out multiple speakers into separate streams, and then compares each speaker with the neural data from the user’s brain. The speaker that best matches a user’s neural data is then amplified above the others (Credit: Nima Mesgarani/Columbia Zuckerman Institute).
“Previously, we had discovered that when two people talk to each other, the brain waves of the speaker begin to resemble the brain waves of the listener,” said Mesgarani.
Using this knowledge the team combined powerful speech-separation algorithms with neural networks, complex mathematical models that imitate the brain’s natural computational abilities. They created a system that first separates out the voices of individual speakers from a group, and then compares the voices of each speaker to the brain waves of the person listening. The speaker whose voice pattern most closely matches the listener’s brain waves is then amplified over the rest.
The researchers published an earlier version of this system in 2017 that, while promising, had a key limitation: It had to be pretrained to recognize specific speakers.
“If you’re in a restaurant with your family, that device would recognize and decode those voices for you,” explained Mesgarani. “But as soon as a new person, such as the waiter, arrived, the system would fail.”
Today’s advance largely solves that issue. With funding from Columbia Technology Ventures to improve their original algorithm, Mesgarani and first authors Cong Han and James O’Sullivan, PhD, again harnessed the power of deep neural networks to build a more sophisticated model that could be generalized to any potential speaker that the listener encountered.
“Our end result was a speech-separation algorithm that performed similarly to previous versions but with an important improvement,” said Mesgarani. “It could recognize and decode a voice—any voice—right off the bat.”
To test the algorithm’s effectiveness, the researchers teamed up with Ashesh Dinesh Mehta, MD, PhD, a neurosurgeon at the Northwell Health Institute for Neurology and Neurosurgery and coauthor of today’s paper. Mehta treats epilepsy patients, some of whom must undergo regular surgeries.
“These patients volunteered to listen to different speakers while we monitored their brain waves directly via electrodes implanted in the patients’ brains,” said Mesgarani. “We then applied the newly developed algorithm to that data.”
The team’s algorithm tracked the patients’ attention as they listened to different speakers that they had not previously heard. When a patient focused on one speaker, the system automatically amplified that voice. When their attention shifted to a different speaker, the volume levels changed to reflect that shift.
Encouraged by their results, the researchers are now investigating how to transform this prototype into a noninvasive device that can be placed externally on the scalp or around the ear. They also hope to further improve and refine the algorithm so that it can function in a broader range of environments.
“So far, we’ve only tested it in an indoor environment,” said Mesgarani. “But we want to ensure that it can work just as well on a busy city street or a noisy restaurant, so that wherever wearers go, they can fully experience the world and people around them.”
This research was supported by the National Institutes of Health (NIDCD-DC014279), the National Institute of Mental Health (R21MH114166), the Pew Charitable Trusts, the Pew Scholars Program in the Biomedical Sciences, and Columbia Technology Ventures.
The authors report no financial or other conflicts of interest.
Original Paper: Han C, O’Sullivan J, Luo Y, Herraro J, Mehta AD, Mesgarani N. Speaker-independent auditory attention decoding without access to clean speech sources. Science Advances. May 15, 2019;5(5):eaav6134.
Source: Columbia Zuckerman Institute, Science Advances
Images: Nima Mesgarani/Columbia Zuckerman Institute