Microsoft Unveils VASA-1, an Image-to-Video AI Model That Generates Eerily Realistic Results


Microsoft has introduced a new artificial intelligence (AI) model that can generate hyper-realistic videos of talking human faces. The AI ​​image-to-video model, called VASA-1, can create a video from just a photo and a speech audio clip. The company says that in the video created, lip movements as well as facial expressions and head movements will be synchronized to match the audio so that it looks natural. Notably, the tech giant does not intend to release a product or API with the VASA-1 model and claims that it will be used to create realistic virtual characters.

In a post on its research announcement page, Microsoft detailed the workings of its under-development AI model and highlighted its capabilities. The company claims that the VASA-1 model can generate videos of 512 x 512p resolution at up to 40 FPS. It is also said that the AI ​​model supports online video generation with negligible initial latency. X (formerly known as Twitter) user Kaio Ken shared A video of the AI ​​model in action.

While VASA-1's biggest achievement is rendering one-minute long videos (as per the demo) in high quality with a single still image, the company has also highlighted its ability to generate lip movements matching the audio file And facial expressions go with it. The AI ​​video generation model also provides granular controls for the user to control various aspects of the video such as lead eye direction, head distance, emotion offset and more. These attribution controls over entangled appearance, 3D head pose, and facial dynamics can help finely modify the output according to user instructions.

Additionally, the AI ​​model was also able to create videos using artistic photos, singing audio, and non-English speech. Microsoft researchers point out that the capability for these functionalities was not present in its data, which points to its self-learning ability.

The AI ​​model's generation of surreal videos of real people with no audio is impressive, but it also raises questions about its unethical use, especially to create deepfakes. The company highlighted that it does not intend to release the AI ​​model to the public and wants to create virtual interactive characters using it.

Microsoft also said that this technology can be used to detect counterfeiting. “While acknowledging the potential for misuse, it is essential to recognize the substantial positive potential of our technology. The benefits – ranging from increasing educational equity, improving access for individuals with communication challenges, and providing companionship or therapeutic support to those in need – underscore the importance of our research and other related explorations. The company said, we are dedicated to developing AI responsibly with the goal of advancing human well-being.

Affiliate links may be automatically generated – see our ethics statement for details.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *