Naturally switching from text to speech is becoming increasingly important in a variety of industries, from the stand-alone car industry to the smart home assistant industry, and Google has taken the plunge to offer one of the most advanced APIs available on cloud.google.com/text-to-speech/.
Passing from text to speech with the Google API
Dan Aharon presented the product stating that developers have been asking for features that allow them to add text-to-speech transformers to their applications, and have used their automated learning engines to provide a cost-effective solution.
With its new tool, it is possible to create interactions with users, between applications and devices. Cloud Text-to-Speech supports applications or devices that can send a REST or gRPC request, and that includes phones, PCs, tablets, and IoT devices (e.g., cars, TVs, and speakers).
From call center automation systems to interactive responses from IoT devices, the solution is already being used by customers such as Cisco and Dolphin ONE.
That is a set of cloud computing services running on the same infrastructure that Google uses internally for products such as Google Search and YouTube, with a text to speech engine developed by DeepMind.
Cloud Text-to-Speech also includes a selection of hi-fi voices created with WaveNet, a generative model for raw audio created by DeepMind. WaveNet synthesizes more natural sound and, on average, produces voice audio that people prefer over other text-to-speech technologies. WaveNet does not do voice synthesis based on a collection of short voice fragments, which tends to create robotic voices, what it does is model raw audio using an automatic learning model to create a much more natural speech.
Its price is $16 for every million characters if you use Wavenet, $4 if you use traditional voices.