The DALL-E 2 AI generates images from text, and the result is fabulous

The DALL-E 2 AI generates images from text, and the result is fabulous

OpenAI is back in business with a new version of DALL-E. And the result is simply phenomenal; it generates still vastly better images than its predecessor.

OpenAI, one of the world leaders in the artificial intelligence segment, often offers groundbreaking work in this field. Some of his most impressive projects are based on GPT-n; it is a series of linguistic models based on deep learning which make it possible to generate everything directly or indirectly related to language, from conversation to computer code.

The system is so powerful that it even worries some observers, but it is not with GPT-3 that OpenAI has shown itself recently. Instead, the firm unveiled the new version of DALL-E, another AI-based system. It doesn’t need to generate text; instead, it creates images from a natural language description… and the result is to die for.

From text to image, there is only one AI

The first version, whose name borrows from the illustrious painter Salvador Dali and the Pixar robot Wall-E, was already stunning. This system was already able to visually represent objects and people from a simple textual description. And this first version was already quite agile.

It worked with a conceptual approach quite similar to GPT-3, which gave it great flexibility. It worked with very down-to-earth descriptions, but also – and this is quite exceptional for an AI of this type – with much more outlandish proposals such as “illustration of a radish walking his dog”.

DALL-E first of the name had never been offered as such to the general public. But if the concept resonates with you, it may be because it reminds you of how WomboDream works, an application based on the first version of the system (see our article here).

It allows to generate stylized images from a piece of text provided by the user and some predefined stylistic options. The result was often very interesting visually, if not perfectly coherent. So there was already enough to be amazed. But this second version simply passes into another dimension.

Two images generated using the “An otter in the style of the Girl with a Pearl Earring” and one “Shiba Inu with a beret and turtleneck“.© DALL-E 2 / OpenAI

A second version to die for

Since then, it has been greatly reinforced by another system also based on AI. The latter is derived from a computer vision system called CLIP; it is used to analyze an image to describe it in the manner of a human. OpenAI has reversed this process to create a CLIP, which guides the composition of an image from words.

Thanks to this procedure, DALL-E 2 can compose images from a so-called “diffusion” process; it starts with an anarchic set of dots, then fills in the image as it goes, gradually refining the level of detail.

Faster, less greedy, generally better optimized and above all even more powerful, this DALL-E 2 offers images that will undoubtedly make certain artists fall head over heels. Indeed, it now has an incredible arsenal of subsystems that allow it to manage a number of scenarios that are simply bewildering.

DALL-E 2 is also capable of editing existing images with a level of precision that is simply unbelievable. It is thus possible to add or remove elements while taking into account the color, the reflections, the shadows… It can even do so while respecting the style of the original image.

The system is also capable of generating various variants of the same output” in different styles, like the astronauts above. It can also keep the same overall style, as in the case of bears.

DALL-E 2 is also able to generate fusions of two images, with characteristic elements taken from both sources. Best of all, it can do all of this at a resolution of 1024 x 1024 pixels, 4x that of its predecessor!

Built-in safeguards to prevent abuse

OpenAI has also added safeguards intended to prevent malicious hijacking of the system. For example, DALL-E 2 will simply refuse to produce any images based on a real name. The objective is obviously to prevent the generation of deepfakes.

DALL-E 2 can also create coherent variants of the same image thanks to style transfer. Here, Vermeer’s “Girl with a Pearl Earring”. © DALL-E 2 / OpenAI

Same thing for all adult imagery. This is also the case for all the elements closely or remotely related to “conspiracies”. No question of generating related content either “to major geopolitical events currently underway”. Theoretically, such a system could therefore not be used to generate ambiguous content; we will therefore not see false images of the Russian-Ukrainian conflict generated in this way.

But even with all these barriers, OpenAI remains aware of the potential harm of its superb technology. Future users will therefore have to show their credentials; you will have to go through a lengthy verification process to obtain partner status. It will therefore be necessary to wait some time before this 2nd version in turn arrives in a consumer app like WomboDream. Too bad for those who were already hoping to have fun with it… but probably more reasonable.

You can find the research paper here and the OpenAI Instagram here.

Leave a Reply

Your email address will not be published.