Einstein Spoke: Chinese Omnihuman-1 Hits New Realism

The Chinese technological company BYTEDANCE, the developer of Tiktok, has unveiled its latest creation – the new neural network omnihuman-1. This multimodal artificial intelligence is capable of generating detailed videos of people based on photos and audio inputs. Developers claim that this model surpasses existing methods of video generation by synchronizing movements, facial expressions, and voice seamlessly.

Omnihuman-1 has the ability to create videos where a person appears to speak, sing, or move as if they were physically present in the frame. Demo videos showcasing this technology have captured significant attention online. One notable example is a 23-second video featuring Albert Einstein, where he appears to be speaking. Experts have described the results as “shockingly realistic” and recognize that the technology is pushing Deepfake videos to a level where distinguishing them from real recordings becomes challenging.

Although BYTEDANCE has not disclosed details about the potential public release of Omnihuman-1, the technology shows promise for various applications, from creating digital avatars to automating video dubbing. In a technical publication, the company’s researchers outlined a new method of training the model, combining textual, audio, and visual data to enhance the scalability of generative algorithms. This approach allows for creating videos with different body proportions and frame formats, ranging from close-ups to larger scenes.

The model excels in accurately capturing facial expressions, lip movements, and gestures, synchronizing them seamlessly with audio files. Test videos, such as one featuring a person giving a Ted Talk-style lecture, showcase the precision of hand movements and articulation, giving the appearance of a live performance.

The development of Omnihuman-1 comes amid escalating restrictions from the United States targeting Chinese technologies in artificial intelligence. Nevertheless, Chinese companies like BYTEDANCE continue to make strides in generative models. Alongside omnihuman-1, BYTEDANCE has introduced platforms like Jimeng AI, incorporating models like Pixeldance and SeaWeed. Recent updates to Jimeng AI have enhanced the synchronization of images with videos, making generated scenes more dynamic.

Notably, other Chinese companies like Kuaishou Technology with the Kling app, as well as startups Zhipu Ai, Shengshu Tech, and Minimax, are also actively working in this space. These developments highlight China’s persistent growth in generative artificial intelligence, despite facing technological sanctions.

/Reports, release notes, official announcements.