Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

Via Ars Technica:

On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker's emotional tone.

The possibilities for this technology are pretty endless. Good and bad.

As we move forward deploying and using these products, I don’t think we place enough emphasis and what the bad could be. I’m reminded of that quote from Jurassic Park

Yeah, but your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should.

I’m convinced there’s plenty of good to come from this and similar technologies like ChatGPT and Stable Diffusion, but I’m equally convinced that the bad stuff might be so bad that its effects on society could be disastrous.

This is why I believe regulation should be one of the first priorities for the world’s governments surrounding the use of these tools.

Regulation is coming, but too little and too late. Governments need to work to internet assumptions, not 20th Century ones as they currently do.

14 January 2023 — French West Indies