Leave a comment

AI’s human voices sound a little too real for many

AI’s human voices sound a little too real for many
© Monicaodo | Dreamstime.com

Do you book appointments by phone because you prefer to speak to a real human? Well, you’ve probably noticed by now, most “real humans” on the front end have been replaced by those cold, annoying voice-activated response systems. That in itself is frustrating, not to mention the on-hold music and the eternal wait.

Wouldn’t it be nice if you had your own personal assistant to call and book appointments for you?

If only you could afford to hire one.

Good news. You don’t have to hire a person. Google Duplex is standing by. Your own personal Artificial Intelligence stands at the ready. And here’s the cool part. Your AI will sound nearly human. It can sound like you, or you can choose from a variety of voices. It’ll book reservations and appointments, in a voice nearly indistinguishable from a human.

But wait. There’s more.

What are AI assistants?

Personal Assistants like Google Duplex are very different than computerized voice-activated response systems. AI Assistants like Google Duplex are relatable, with their own personality … right down to their sense of humor. They stammer with the occasional “ummm.” They have a vocabulary that they can learn from, and with experience, they expand their “knowledge.”

They’re intuitive, and can sense what kind of mood you’re in. Google Duplex will even thank your children for saying “please” instead of “hey.” They’ll remind you to take a break if you’re working too hard. They can tell stories and sing songs. If you want to hear Google Duplex in action, I just covered the entire topic in my recent podcast, The Sounds of AI.

Personal AI Assistants have come a long way in just a few short years. Siri speaks 21 different languages in both male and female mode. Cortana speaks eight languages, Google Assistant speaks four, and Alexa speaks two. And they’re learning more. But those “few short years” translates into years of research and development.

Think about it. Google’s been listening to us for years. They know what we want. They know what makes us tick. They know what makes us more productive.

In all my years of research, writing and producing tech-related media, I’ve rarely seen a technology grow, evolve and improve as fast as AI. The look and the feel of commercial artificial intelligence is suddenly very, very human. But the thing that impresses me most is the SOUND.

I have to admit, I’m a little excited that I can have an AI assistant book appointments for me. And that’s not all, it can speak using my own voice. But there’s a flip-side to that. Being an on-air celebrity means that my voice may be accessible to anyone who wants to use it. Now that concerns me, and I plan to cover all the legal and security issues ASAP.

How can AI assistants sound human?

So, how come Google Duplex sounds so human? We can talk about dilated convolutions, convolutional neural networks, formants, output amplitudes and receptive fields, or I can shoot it to you straight. For the purposes of this article, let’s stick to language we all understand! If you want to go deeper, hear more and learn more, just listen to my podcast, The Sounds of AI. You won’t believe your ears!

How are AI voices generated? Siri, Alexa and other Personal Assistants use a library of sounds that are linked together to create words that are recognizable. The more human sounding Google Duplex uses WaveNet technology, a deep neural network that generates audio.

WaveNet is distinctly different from the “speech synthesis” technology ala Siri, Alexa and others. WaveNet’s neural network consists of different layers of interconnected nodes, sort of like the structure neurons form in your brain. The network perceives a single signal and synthesizes, or recreates digitally, a very tiny output. Networks trained to this degree are then used to create new speech-like waveforms at an amazing 16,000 samples per second, including human-sounding breaths and lip smacks.

Think about it, 16,000 sample sounds per one second of audio to sound human-generated! WaveNet currently models a variety of human voices including those of different ethnicities, countries, accents and languages, even celebrities. But do they express their own thoughts? No.

Even the AI robot Sophie, while outspoken about many topics, is not actually speaking for herself. Her thoughts are put together by “knowledge” but are not self-generated. AI can actually learn the words, but the learning is always based on HUMAN thinking. That’s an important distinction.

Intuitive AI is complicated

Getting AI to SOUND intuitive; that’s a whole other area of expertise. In an article for Wired, Apple executive Alex Acero told David Pierce that they listened to literally hundreds of voices before finding the right one for Siri. The winning voice was one of helpfulness and camaraderie. But as the voice of Google Duplex raises the bar, I think the goal of “natural interaction” is going to be key for developers.

In the music world, innovative neural network programmers and composers like Aaron Van Den Oord of Deep Mind, London and Andrew Huang from Google’s Magenta Team are tapping into the “creative minds” of AI. In a nutshell, they’ve figured out how to translate sounds and musical pieces into complex networks, resulting in new music genres composed primarily by Artificial Intel. By analyzing and regenerating sound sources, which have been crossed with each other, AI can indeed generate original compositions.

It’s complicated, but the goal of Magenta, among other things, is to use machine learning to develop new avenues of human expression. So it’s still really all about humans.

But the question remains, is it GOOD?

I’ve heard a few pieces. It was interesting! I don’t know that I would have chosen those particular chord progressions, but it doesn’t mean the music’s bad. In my opinion, it’s really all a matter of what the culture is attuned to.

Listen to the entire podcast now

Want to hear audio samples? Want more information? You’ll get an earful of voice and music in my new podcast, The Sounds of AI! You’ll learn about the development process from an Algorithm R&D Specialist, a tech & gaming voice-over artist and an AI music programmer from the Google Magenta Team.


I have plenty of reviews, articles, podcasts and news reports about Artificial Intelligence on my website. And I’ll have more with each new update, so make sure you subscribe to my podcast and newsletter. I’m Kim Komando, American’s Number One Digital Pro and voice-of-reason in an AI world. Make it a day worth remembering!

Bonus: Subscribe and listen to our free Komando on Demand podcasts

These informative podcasts from Kim Komando delve into the fast moving world of technology and the relevant tech issues and topics that are continuously transforming our lives.

It's easy to subscribe! Just tap or click the iTunes or Google Play or Spotify link to subscribe via your smartphone, or download individual episodes on your computer by clicking the “Download” link in the bottom right of each episode’s summary.

Next Story
5 Amazon default settings you must change now to increase your privacy
Previous Tips

5 Amazon default settings you must change now to increase your privacy

Are you hiring? Find qualified candidates like I do
Next Tips

Are you hiring? Find qualified candidates like I do

View Comments ()