Voice Security

People have learned to preserve their privacy by safeguarding computer passwords. But with the rise of voice authentication systems, they also need to protect unique voice characteristics.

Researchers at Carnegie Mellon University's Language Technologies Institute (LTI) say that it is possible with a system they developed. The CMU system converts a user's voiceprint into alphanumeric strings that can serve as passwords.

This system would enable people to register or check in on a voice authentication system — without their actual voice ever leaving their smartphone. This reduces the risk that a fraudster will obtain the person's voice biometric data, which could be used to access bank, healthcare or other personal accounts.

"When you use a speaker authentication system, you're placing a lot of faith in the system," said Bhiksha Raj, CMU associate professor of language technologies. "It's not just that your voiceprint might be stolen from the system and used to impersonate you elsewhere. Your voice also carries a lot of information — your gender, your emotional state, your nationality."

"To preserve privacy, we need systems that can identify you without actually hearing your voice or even keeping an encrypted record of your voice."

Raj and Manas Pathak (CS'09/CS'12), a recent Ph.D. graduate of LTI, have devised a method for converting a voiceprint — a spectrogram that represents the acoustic qualities of speech. On Sept. 21 they will present their work as a keynote address at the Information Security Conference in Passau, Germany.

Because a person's voice never sends the same signal twice, even when repeating the same word or phrase, converting the voiceprint into a single password won't do.

So, the CMU system uses different mathematical functions to generate hundreds of alphanumeric strings. To authenticate the user, the system compares all of the strings with those that the system has on file from the initial registration; if enough of the strings match, the user is authenticated.

The system also adds what the researchers call "salt" — a random string of digits unique to each smartphone — to the alphanumeric strings, providing an additional level of security.

In tests using standardized speech datasets, Raj and Pathak found that their system was accurate 95 percent of the time. The privacy-preserving method is computationally efficient, so it could be used with most smartphones.

But Raj also warned that improving the security of voice authentication systems would be just a first step to protecting privacy overall.

"With increasing use of speech-based services, such as the iPhone's Siri assistant or the personal videos uploaded to YouTube, the issue of the privacy of users' speech data is only just beginning to be considered," he said.

In addition to Raj and Pathak, Jose Portelo and Isabel Trancoso of INESC-ID in Lisbon, Portugal, contributed to this research. This work was supported by the National Science Foundation and Portugal's Foundation for Science and Technology (FCT).

Homepage Story Archives