Generate Realistic Virtual Characters with Microsoft Research’s VASA-1 Architecture

2024-04-18 11:34:52

Microsoft Research has released a “VASA-1” architecture capable of using a single static image, recording clips and controlling signals to generate videos with precise voice synchronization, realistic facial expressions and natural body movements. head. VASA-1 can provide high-quality videos, and also supports online generation of 512 X 512 videos, laying the foundation for real-time interaction and communication with vivid virtual characters in the future .

Microsoft Research Launches VASA-1 Architecture: Generate Realistic Virtual Characters From a Single Photo and Recording

Advertisement (Please continue reading this article)

As an architecture for generating realistic talking faces of avatars with engaging visual affective skills (VAS), VASA-1 is capable of producing finely synchronized recordings of lip movements and can capture a wide range of facial expressions subtle and natural movements of the head. realism and liveliness of virtual characters.

Advertisement (Please continue reading this article)

In addition to being able to generate realistic and vivid videos, VASA-1 is also controllable. The diffusion model of VASA-1 can use cues such as the character’s gaze direction, head distance, and different emotional changes as conditions. character.

Different eye gaze directions (front, left, right, upward):

Different distances of the head from the lens:

Different emotional changes such as neutral, happy, angry and surprised:

VASA-1 can also process photos and recordings that are not being trained, singing recordings or non-English recordings can also be generated.

VASA-1 separates appearance, 3D head pose and facial dynamics from a single image, allowing you to individually control and modify each attribute of the generated content. For example, a motion sequence uses three different photos.

Editing poses and expressions (original generation results, pose-only results, expression-only results, and expressions with rotated poses)

Taking a desktop computer with a single NVIDIA RTX 4090 GPU as an example, VASA-1 can output 512 x 512 video at 45 frames per second in offline processing mode and up to 40 frames per second in streaming mode in real time. , the lead latency is only 170 milliseconds. https://vasavatar.github.io/VASA-1/video/realtime_demo.mp4

The team that developed VASA-1 said that while we are aware of the risk of this technology being misused, we strongly believe that it can have more positive impacts. VASA-1 helps increase educational equity, improve the quality of life of people with communication disorders, and provide guidance and therapeutic support to those who need it. These potential benefits highlight the importance of our study and other related explorations. This technology will also be actively applied to counterfeit detection to combat any misleading or deceptive behavior. Although currently generated videos still have identifiable traces, we believe that through tireless efforts we will eventually reach a level indistinguishable from real videos. We will continue to harness the enormous potential of this technology in a responsible and ethical manner to have more positive impacts on human society. Friends interested in VASA-1 can click here!here! Go to find out more.

1713455442
#Microsoft #Research #Launches #VASA1 #Architecture #Generating #Realistic #Virtual #Characters #Single #Photo #Recording #Computer #King #Ada

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.