After ChatGPT and DALL-E, meet VALL-E - artificial intelligence that can imitate the voice of any person

11.01.2023,

#text-to-sound

After ChatGPT and DALL-E, meet VALL-E - artificial intelligence that can imitate the voice of any person

Last year, artificial intelligence (AI) tools emerged that can create images, artwork, and even videos from a text request.

There have also been significant strides forward in artificial intelligence in writing, with OpenAI's ChatGPT sparking widespread excitement—and fear—about the future of writing.

Now that 2023 is just days away, another powerful use case for AI is coming to the fore—a text-to-voice tool that can flawlessly imitate a human voice.

Developed by Microsoft, VALL-E can take a three-second recording of someone's voice and play it back, turning the written words into speech, with realistic intonation and emotion depending on the text's context.

Trained on 60,000 hours of recordings of English speech, it can deliver a speech in a "null situation", that is, without any prior examples or training in a specific context or situation.

Introducing VALL-E in a paper published by Cornell University, the developers explained that the recording data consisted of more than 7,000 unique speakers.

According to the team, their text-to-speech (TTS) system used hundreds of times more data than existing TTS systems, which helped them overcome the "zero-shot" problem.

VALL-E: link

The tool is not currently available for public use - but it raises security questions because it can be used to generate any text coming from any person's voice.

Github link: https://valle-demo.github.io/