Researchers from the Alibaba Group Institute of Intellectual Computing have developed a new video generation technology called “animate anyone“. This breakthrough significantly surpasses previous systems, such as Disco and Dreampose, in converting images into videos.
“Animate Anyone” allows users to create realistic videos from static images, reaching a level of quality that can deceive the human eye. While similar quality has already been achieved in the field of static images and text dialogues, this technology brings it to the realm of video, further blurring our perception of reality.
The process begins with extracting details from the original image, such as facial features, patterns, and postures. These details are then superimposed on slightly different poses, either captured from motion or extracted from another video, to create a series of images.
Early models faced challenges, such as creating believable details from scratch, resulting in strange and intolerable images. However, “Animate Anyone” has made significant improvements to this process, though it is not yet perfect.
While the technical details of the new model are complex, it is worth noting an important new intermediate stage. This stage allows the model to comprehensively study the relationship between the original image and the characteristics of the generated images, leading to better preservation of appearance details.
The results of this technology are demonstrated in various contexts, including fashion models posing without deformation, 2D-anime characters coming to life and dancing convincingly, and even Lionel Messi performing common movements. However, the model still faces challenges, particularly with eyes, hands, and poses that differ greatly from the original image.
One concern with this technology is the potential for misuse by attackers, who could manipulate videos using only a high-quality image. Currently, the technology is too complex and unstable for widespread use, but advancements in the field of AI are rapidly changing the landscape.
The developer team does not plan to release the code to the public at this time. Their Github page states that they are actively working on preparing a demonstration and making the code accessible to the public, but no specific release date has been set.
As the possibility of fake videos spreading across the internet looms, the question remains: what will be the consequences? The answer may come sooner than we expect.