Implementation of machine learning system for synthesis of images according to text description

published open implementation of the machine learning system Dall-E 2 proposed by Openai and allows you to synthesize realistic images and paintings on the basis of a textual description in a natural language, as well as apply commands in a natural language to edit images (for example, add, delete or move objects in the image) . The initial Dall-E 2 models from Openai are not published , but is available start with a detailed description of the method. Based on the existing description, independent researchers prepared an alternative implementation written in Python, using the Pytorch Framevork and distributed under the license. > Compared to the previously published implementation of the first generation of dall-E, the new version provides a more accurate compliance with the description, allows us to achieve more photorealism And makes it possible to form images in higher resolutions. The system requires large resources to teaching the model, for example, to teach the initial version of Dall-E 2, 100-200 thousand hours of calculations on the GPU, i.e. About 2-4 weeks of calculations in the presence of 256 GPU nvidia tesla v100.


The same author also began to develop an expanded version – dalle2 video aimed at synthesis of a video description video. Separately, it is possible to note the project developed by Sberbank ru-dalle , with the open implementation of the first generation of Dall-E, adapted for recognizing descriptions on Russian.

/Media reports.