Home Artificial Intelligence This AI System Lets Google Assistant Sound More Human

This AI System Lets Google Assistant Sound More Human

November 7, 2017

2272

Google Assistant — Source: Close-up Engineering Systems

Thanks to Google and artificial intelligence (AI) research company, DeepMind, your phone will no longer sound like a robot when reading out or dictating requested information. Google Assistant is using an improved version of DeepMind’s WaveNet, a deep neural network that can synthesize realistic human speech.

WaveNet uses an improved system of speech synthesis or text-to-speech (TTS). TTS uses two techniques, concatenative and parametric TTS. In order to closely mimic human speech, concatenative TTS juxtaposes different parts of a voice actor’s recordings to construct the desired sentence. Upgrading concatenative TTS is cumbersome as it involves replacing the audio libraries. Parametric TTS generates computer-generated speech that tends to sound robotic and artificial.

How Does WaveNet Work?

Unlike these two TTS systems, WaveNet uses a system developed from a convolutional neural network to produce waveforms from scratch. Speech samples are used to train the platform to synthesize voices. The system determines which waveforms sound like people and which do not. This provides the speech synthesizer with the ability to mimic human intonations such as lip smacks. The system is even capable of coming up with its own accent based on the given samples.

In earlier years, amount of computing power needed to generate the audio was a severe limitation for WaveNet. It used to take at least one second to produce .02 seconds of audio. DeepMind’s engineers fixed the problem, and the system was able to produce a one-second-long waveform in 50 milliseconds. The sample’s resolution has doubled from 8 to 16 bits. This directly translates into audio that score much higher in human listening tests.

The improvements enable system integration into Google Assistant and other consumer products. As of today, Google Assistant can produce Japanese and U.S. English voices. Eventually, Google can use WaveNet to synthesize speeches for other dialects and languages. Eventually, computer-generated speech will sound more like humans, getting it correct right down to the peculiar regional accent.

This AI System Lets Google Assistant Sound More Human

How Does WaveNet Work?

EVEN MORE NEWS

Interview with Jinyong Lee, CEO of Kryptos Biotechnologies

Interview with Phil Markunas, CTO of Standard Practice.ai

Interview with Tobias Dengel, President of WillowTree

POPULAR CATEGORY

How Does WaveNet Work?

RELATED ARTICLESMORE FROM AUTHOR

Navigating the High Stakes of Venture Capital: Insights from Scott Amyx

Scott Amyx Keynotes on AI at Microsoft Solutions Center at Times Square New York

Scott Amyx Shares His North Star

EVEN MORE NEWS

Interview with Jinyong Lee, CEO of Kryptos Biotechnologies

Interview with Phil Markunas, CTO of Standard Practice.ai

Interview with Tobias Dengel, President of WillowTree

POPULAR CATEGORY

RELATED ARTICLES MORE FROM AUTHOR