top of page

With or Without You

Updated: Jun 29

Let me start by stating one of my conclusions here: our current text-to-image AI technology, while impressive, is still immature. It has the potential to go in two opposite directions - one of them will give artists ultimate control, the other will take it completely out of the equation. Both of these options are interesting and probably worth pursuing, but only one of them is not existentially alarming.

Two years ago we were, seemingly out of nowhere, suddenly able to produce extremely visually appealing images by using text-to-image AIs such as Dall-E, Midjourney and Stable Diffusion. There are other types of text-to-media AIs, including recently some advancements were made in text-to-video algorithms, but for the purposes of this post, and even though I believe my conclusions here apply more broadly, I would like to focus on text-to-image AIs.

We all tried it back in the day. We opened a browser with one of those text-to-image services for the first time and typed in something ridiculous like "a robot painting Van Gogh's Starry Night". A few sips of coffee later we got our next existential crisis: how can this "thing" produce such appealing results in seconds whereas trained professionals achieve similar results in several hours at best?

A robot painting Van Gogh
"A robot painting Van Gogh's Starry Night", AI image created using Midjourney

But, as remarkable as these results are, and regardless of the many legal issues stacking which I am not going to address in this post, artists who try to use AI tools face some difficult problems. At the end of the day, and even though these technologies are game-changers in some disciplines, as artists all we can control right now using a text-to-image AI, by its very nature, is the concept. We cannot satisfactorily control the colors, the lighting, the composition, the little details.

Here is a list of problems with the above image that I, as an artist, would like to solve:

  • The blue and yellow colors of Van Gogh's painting blend too much with the rest of the colors in the scene, which reduces the overall color contrast that we should thematically require between the painting and its robotic surroundings. Same goes for the colors of the robot.

  • The lights near the robot's head are a bit blinding, they shift the contrast (and therefore the focus) to the wrong focal point. The contrast should be between the robot's head-hand and the painting.

  • The geeky science lab lights are not suitable for painting, they create some sort of disturbing feeling of inconsistency and make the robot's act of painting feel less believable, which doesn't fit in the narrative. The lighting setup should be partly natural daylight from windows and the other part small, scattered studio lights.

  • The conceptual contrast is also wrong. I would have liked the robot to paint inside some futuristic room with view to nature. So, imagine the left part of the room is futuristic, and the right part behind the painting is some window to a very natural scenery. This would increase the contrast between the robot's world and Van Gogh's world. It is also possible to have the robot not in a futuristic room but in an ancient studio, thus increasing the contrast between the robot and itself.

  • We also need to zoom out a little bit. Feeling the environment in this type of work, seeing more of the robot's home, would have made a great contribution to the theme. We could have seen, for instance, some other paintings by the robot laying in background: is it only Van Gogh or only impressionists or only great masters? This sort of conceptual consistency is especially important for a cinematic effect.

  • I like the glass on the robot's head, but the elements inside are either invisible or unreadable. Ideally we want to have some mechanical set of eyes or some other robotic representation of sight that will confront us with the question of whether the robot is in a way human or not, or whether we are in a way robot or not.

  • The brush doesn't make any sense, and the oils on the painting look flat instead of having their usual oily bumps. Additionally, the painting looks a bit glowing, as if it is not effected by the room lights, which makes it feel like it doesn't belong in the scene.

  • No one paints inside a frame, you add the frame later (I admit, this could be some cool robotic ability, to not worry about messing up the frame, but it still looks weird).

  • The left leg of the easel is hovering above the table, and the elements on top of the table don't make sense.

  • Trust me, I could go on and on and the only thing preventing me from doing so is my consideration of your time. I usually make hundreds of these and other adjustments in my own works, and many other artists do the same. You don't get a good piece by accident, it's usually a function of how much time you put into it.

And while it is possible that I could have made few of these changes by slightly altering my original prompt or using more advanced AI methods or making small touch-ups in Photoshop, it is not possible as of this moment to achieve my desired result - not even close to it.

And in a way, this is precisely the question: can we get our desired results using AI methods? Or, rather, perhaps there is a question preceding this one: why do we even need to get our desired results to begin with? These two questions define, in my opinion, the future of these technologies. Because we can have both.

Even if we cannot do it today, we will be able to use AIs as completely functional artist apprentices in future. We will be able, using a more sophisticated interface than a text prompt, to engage with AIs in a way that will enable us to materialize our visions. We will be able to speak to an AI and point at things, perhaps with some sophisticated controlling device (whatever the evolution of the mouse-keyboard interface will turn out to be), some dozens of years from now, and make all of the above adjustments and more. Let us call these types of AIs human-oriented AIs.

But there will also be independent AIs. Why is it necessarily more interesting to see my own artistic vision materialize rather than experiencing some future AIs vision? What happens if independent AIs come up with concepts not less interesting than our own, and are also able to execute them consistently and interestingly? What if they can even explain these concepts to us so that we can understand their value? Maybe, some day in the not to distant future, an AI will be able to come up, completely on its own, with a better concept and better execution for an AI robot who paints Van Gogh than I tried to sketch above.

We will be able, one day, to use human-oriented AIs to fully materialize our own visions and also have independent AIs with interesting visions of their own coming about. This is the fork in the road that we are currently in: we can use human-oriented AIs to give us ultimate control with less technical proficiency, or we can give independent AIs ultimate control with no need for us. We will probably do both, simply because we always choose to advance technologies regardless of the consequences. This is not to say that I think the consequences will be bad, just that people worship technology.

One of the reasons I myself am drawn to technology is that it confronts us with philosophical questions. When the photography revolution began painters weren't so happy initially, since up until then everyone were pretty much realist painters. But even a modest camera is in some sense more realist than the greatest painter. Hence, one of the previous artistic existential threats came to existence. Artists were forced to ask themselves, given the new technology, questions about the purpose of realism and the purpose of art, which gave birth to some of the most profound movements in art, such as Cubism. Reality became, on this paradigm, the origami unfolding of 3D elements into abstract 2D shapes. And this - showing us more than 3 faces of a cube at the same moment - a camera cannot do.

It is hard to imagine what our current existential threat will amount to. Like I said I believe that in future we will be able, with minimal technical skills, to be able to create more freely, more quickly, and with more control. This is an existential threat removed. But independent AIs will be able to create very interesting art without the need for humans. It is possible that we will have both human art and AI art, and that we will be interested in these two different art categories for different reasons, one emphasizing the artist's story and our empathy to the artist's history, the other drawing more attention to the alien nature, or godly nature, or what have you, of future AI art.

Since Barthes's The Death of the Author there have been many scholars claiming that the author's intentions are not important, that the observer's experience and interpretation are more relevant. Indeed, this is the most widespread view to this day. Perhaps in future we will take this theory to a whole new level - not only the author's intentions will no longer be important - there will be no (human) author.

This text was written by an AI. Nahhhh, kidding, it's still just me :)


bottom of page