The artificial intelligence program Midjourney is capable of some amazing results, as we could see in my previous post ‘Exploring imagination with Midjourney‘. Not long ago, all the fuss was about its rival Dall-E 2. So naturally, I was curious how they would compare to each other, if they are provided with the same input requests in the form of text. I registered my account for Dall-E 2 and ran exactly the same commands for the pictures we saw in the Midjourney post.

Even though I wanted to get some amazing results from Dall-E 2, the reality was different. As you can see on the images below, the “intelligence” level of Dall-E 2 is on another, lower level than Midjourney. I feel like the latter has been specifically optimized by artists to present style and compositions of a few different types that are always pleasing to other artists. Central composition with a close-up shots or distant composition with small significant objects in the golden ratio portions of the image. In that regard, I believe that the results obtained from Dall-E 2 are more honest and naive at the same time. Let’s see how this is reflected in the generated images.

Impressionist futuristic architecture

First, we will compare the input of Midjourney for the levitating houses of the future, surrounded by a lot of vegetation and alien planet landscape. The results from this experiment show the early age of the Dall-E 2 and its very primitive understanding of more complicated concepts such as art, future and composition. Here is the command I ran:

white futuristic flying houses over a green and yellow forest on a new habitable planet with strange plants and flying cars and people around monet

The colours that this artificial intelligence used are very harsh, and definitely not in the style of Monet. Also, I cannot clearly feel where are the levitating houses and I see more of an objects that look like an UFO. The blue colour also feels very intense and it’s not even been specified in the text prompt. Next, I tried to iterate two of these images to generate more versions of them:

Not a great improvement by any means, and the blue got even more intense. And the second try was not any better.

Space girl

Then, I gave up and proceeded to the second input which tried to depict an astronaut girl on the red planet. For this command, I also used an artistic reference to Rembrandt, in order to give a bit more intense feel of the scene. The result I got honestly scared me. The faces were distorted in pain, and the background was something that was a combination of the Giza pyramids and images of Mars from Earth in the sky. The prompt I used is the same as the one for midjourney:

beautiful young woman with a futuristic spacesuite on mars, show habitat in backgorund  Rembrandt van Rijn

Variations are meaningless, but nevertheless I tried for one of the pictures and as you can see below, the results are not any different. You can definitely see the suffering in the picture of the girl holding her chest, is that right Dall-E? But then, maybe my mistakes in spelling are contributing a little bit to the confusion of the AI. Nevertheless Midjourney guessed on top of the mistakes.

Urban sprawl

The last scene I tested in Midjourney was related to an urban scene that shows a future city, filled with lush vegetation and many people, but less cars that are flying in the air. Also tried to challenge the AI’s abilities to generate dynamic scenes by setting a description for the weather:

detailed future city with very tall and thin buildings connected by bridges and highways with a forest between the buildings and on top of the buildings, and a lot flying vehicles and drones, show many people in foreground in a lawn with floweres, half sunny half stormy weather

The result here really surprised me because I was already very disappointed and did not expect anything interesting. However, Dall-E 2 showed some wit and presented me with fun and vibrant images that look more like an illustrations rather than photo-realistic or painted artwork.

Then I realised that maybe the way Dall-E works is by different type of input, meaning different order of the words do impact to a great extent the generated images. In Midjourney on the other hand, I feel the effect is less but I don’t have the data to support this yet. I tried one more time with input sounding more like a natural language and a sentence I would say to someone:

a hyper detailed image of drone and robot city in the future where humans enjoy life without working

I also have to admit that these images were generated one month and a half ago in August 2022 and right after the images from Midjourney. I wrote article with these old images as they more accurately reflect the progress of both platforms at that time. By now both platforms experienced updates and deliver results that are even more interesting and engaging. Here is an example of the first prompt of this post that I ran again today:

white futuristic flying houses over a green and yellow forest on a new habitable planet with strange plants and flying cars and people around monet

We can clearly see an improvement over the original results in the pallet, the flying buildings and more accurate color pallets, but they’re still not as good as the original Midjourney ones, which keeps its technological advantage. However, there are some amazing tools that are available for Dall-E such as in and out painting. I am looking forward to seeing more cool Dall-E results that can rival Midjourney.

What I expect in the future is even more competition in the field and adoption of AI by different designers and architects on various levels. This will could bring even better generation algorithms and results for everyone, as well as allow for new design workflows that might pose challenge to some existing professions. But with every challenge comes a great new opportunity!

Leave a Reply

Your email address will not be published. Required fields are marked *