Tuesday, September 27

An AI can decode speech from brain activity with startling accuracy

An artificial intelligence can decode words and phrases from brain activity with surprising, but still limited, accuracy. Using just a few seconds of brain activity data, the AI ​​guesses what a person heard. It lists the correct answer in its top 10 possibilities up to 73% of the time, researchers found in a preliminary study.

The “AI performance was better than many people thought was possible at this point,” says Giovanni Di Liberto, a computer scientist at Trinity College Dublin who was not involved in the research.

Developed within Facebook’s parent company Meta, AI could eventually be used to help thousands of people around the world unable to communicate through speech, typing or gestures, researchers report Aug. 25 on arXiv. org. This includes many patients in states of minimal consciousness, locked up or “vegetative states” – what is now generally known as unresponsive arousal syndrome (SN: 08/02/19).

Most existing technologies to help these patients communicate require risky brain surgeries to implant electrodes. This new approach “could provide a viable route to help patients with communication deficits…without resorting to invasive methods,” says neuroscientist Jean-Rémi King, a Meta AI researcher currently at the École Normale Supérieure in Paris. .

King and his colleagues trained a computer tool to detect words and phrases from 56,000 hours of voice recordings in 53 languages. The tool, also known as a language model, learned to recognize specific features of language both at a fine level – think letters or syllables – and at a broader level, such as a word or phrase. phrasing.

The team applied an AI with this language model to databases from four institutions that included the brain activity of 169 volunteers. In these databases, participants listened to various stories and phrases taken, for example, from the work of Ernest Hemingway The old Man and the Sea and Lewis Carroll Alices Adventures in Wonderland while people’s brains were scanned using magnetoencephalography or electroencephalography. These techniques measure the magnetic or electrical component of brain signals.

Then, using a computational method that helps account for physical differences between real brains, the team attempted to decode what participants heard using just three seconds of brain activity data from each person. The team asked the AI ​​to align speech sounds from story recordings to patterns of brain activity that the AI ​​calculated matched what people were hearing. It then made predictions about what the person might have heard during that short time, given over 1,000 possibilities.

Using magnetoencephalography, or MEG, the correct answer was in the AI’s top 10 guesses up to 73% of the time, the researchers found. With electroencephalography, this value fell to no more than 30%. “[That MEG] the performance is very good,” says Di Liberto, but he is less optimistic about its practical use. “What can we do with it? Nothing. Absolutely nothing.”

The reason, he says, is that MEG requires a large and expensive machine. Bringing this technology into clinics will require scientific innovations that will make the machines cheaper and easier to use.

It’s also important to understand what “decoding” really means in this study, says Jonathan Brennan, a linguist at the University of Michigan at Ann Arbor. The word is often used to describe the process of deciphering information directly from a source – in this case speech from brain activity. But the AI ​​could only do that because it had a finite list of possible correct answers to make its guesses.

“With language, that won’t be enough if we want to move into practical use, because language is infinite,” Brennan says.

Additionally, says Di Liberto, the AI ​​decoded information from participants passively listening to audio, which is not directly relevant to nonverbal patients. For it to become a meaningful communication tool, scientists will need to learn to decipher from brain activity what these patients intend to say, including expressions of hunger, discomfort, or a simple “yes” or “Nope”.

The new study is “the decoding of speech perception, not production,” agrees King. Although speech production is the ultimate goal, for now, “we’re pretty far from that.”