Would you like to hear your favorite celebrity wish you a happy birthday? Would you like to sing anything and have the result be industinguishable from your favorite singer, living or dead? Would you like the voice of a deceased loved one speak to you? These things may be possible through improvements in voice cloning technology.
Artificial intelligence is making human speech as malleable and replicable as pixels. Today, a Canadian AI startup named Lyrebird unveiled its first product: a set of algorithms the company claims can clone anyone’s voice by listening to just a single minute of sample audio.
A few years ago this would have been impossible, but the analytic prowess of machine learning has proven to be a perfect fit for the idiosyncrasies of human speech. Using artificial intelligence, companies like Google have been able to create incredibly life-like synthesized voices, while Adobe has unveiled its own prototype software called Project VoCo that can edit human speech like Photoshop tweaks digital images.
But while Project VoCo requires at least 20 minutes of sample audio before it can mimic a voice, Lyrebird cuts this requirements down to just 60 seconds. The results certainly aren’t indistinguishable from human speech, but they’re impressive all the same, and will no doubt improve over time. Below you can hear the synthesized voices of Donald Trump, Barack Obama, and Hillary Clinton discussing the startup:
Lyrebird says its algorithms can also infuse the speech it creates with emotion, letting customers make voices sound angry, sympathetic, or stressed out. The resulting speech can be put to a wide range of uses, says Lyrebird, including “reading of audio books with famous voices, for connected devices of any kind, for speech synthesis for people with disabilities, for animation movies or for video game studios.”
There are more troubling uses as well. We already know that synthetic voice generators can trick biometric software used to verify identity. And, given enough source material, AI programs can generate pretty convincing fake pictures and video of anyone you like. For example, this research from 2016 uses 3D mapping to turn videos of famous politicians, including George W. Bush and Vladimir Putin, into real-time “puppets” controlled by engineers. Combine this with a realistic voice synthesizer and you could have a Facebook video of Donald Trump announcing that the US is bombing North Korea going viral before you know it. That said, while Lyrebird does do a good Trump impression, its other voices are noticeably more robotic:
Lyrebird is aware of these problems, but its suggested fix feels far from adequate. In an “Ethics” section on the company’s website, Lyrebird’s founders (three university students from the University of Montréal) acknowledge that their technology “raises important societal issues,” including bringing into question the veracity of audio recordings used in court. “This could potentially have dangerous consequences such as misleading diplomats, fraud, and more generally any other problem caused by stealing the identity of someone else,” they write.
Their solution is to release the technology publicly and make it “available to anyone.” That way the damage will be lessened because “everyone will soon be aware that such technology exists.” This, though, seems like a willfully naive answer. The uptake of any technology is uneven, and the popular understanding of it even more so. No-one thinks their government has a grounded understanding of contemporary tech.
We’ve reached out to Lyrebird to find out more about the company’s plans, but for the moment its technology is still under development. There’s no explanation of what the sample audio has to sound like, or how much computing power is needed to generate a fake voice, and the company’s website says its speech APIs are still “in beta,” with no mention of future pricing plans or availability either. We’ll update this story if and when we hear more. In the mean time, just don’t trust any phone calls from oddly robotic-sounding family members asking you to transfer all your money to a certain bank account.
Keep this in mind as you watch the news.
Sound samples here. They sound a bit artificial, but you can tell they are on the right track.
If you’ve never heard a real Lyrebird, they can imitate almost any sound. Watch this video of a bird imitating the sound of saws it heard in the forest, including a chainsaw.
In this one, you can even hear the sound of a voice one of the workers among the construction sounds a lyrebird is imitating. Amazing.
You can see why a voice cloning company chose this name.
As a side note, several species of birds can imitate human speech. Did you know a raven can talk?
Do you think Edgar Allen Poe really heard a raven say the word “Nevermore“? It seems possible.