New Video: Why Design Language Matters for Multimodal models like DALL-E [GPT-X, DALL-E, and our Multimodal Future]

Sep 07, 2021

Today’s video is a really important lesson I can imagine for our multimodal future.

Right now, the models are getting better at generating images based on our prompts, but over the next year or so, I believe these models will also get better at interpreting edits/changes we want made once an image has already been generated. I think we will have various tools to communicate (ie. some kind of brush tool to highlight specific areas on the canvas) which changes we want made … but I think for creatives of the future, natural language will be the dominant way they will command edits/changes they want made.

This video will walk you through the importance of art theory/design language, what it means in a multimodal context, and shares tips on how you can improve your vocabulary.

YouTube Transcript (SPOILER WARNING)

Yes, multimodal AI models like DALL-E can generate images for you just with text.

But say you like the image (so far) but want to change it in a really specific way. How do you do this? DALL-E may be able to create your image initially with simple language you give it, but this doesn’t mean it can also follow along with vague instructions you give later on for changes you want made.

It can’t read your mind. And to be honest, it’s your job anyways to clearly communicate these changes to get the exact vision you probably have in your head … just like you would with a human worker. Many people do not know basic art and design vocabulary.

For example, most non-creative people I’ve spoken to - don’t even know what opacity means. They don’t know the difference between Transparent and Opaque - but this is something any designer with basic experience in Photoshop could explain to you.

What about colour palettes? Are you familiar with the Hexadecimal based colour system? Let me tell DALL-E right now, that I want this background specifically in colour code #0099FF.

You may have already subconsciously noticed these kinds of creative details throughout your life, but if you want to work together with an AI model, you’ll need to actually apply the specific terms and theories to describe them exactly.

I already made a video on composition and phrasing, but I suspect there are other art and design related concepts too which can help you not just describe but enhance your creative works such as contrast, irony, perspective, texture, and more. For example, Sinix Design has a great video on conceptual contrast, which I really believe could elevate your work. I’ve put the link in the description below.

Start by expanding your graphic design vocabulary. There are many ways to do this, but I like to just do online video tutorials and absorb design language that way. Then, I try to apply these words organically, on my own, in the course of my daily life.

So, what about our changes? Going back to the multimodal photo editor app concept, let’s edit our image and make our changes in a more precise way, using the appropriate design theory language. Here’s how I’m going to make my first change on the list, I’m going to write it with the appropriate design language description. You can see now that the multimodal engine instantly made the change that we wanted on the image. Jumping to the end …. our final image, with all the markup changes we wanted, ended up looking like this. Much better!

The Key Idea

Multimodal models may be able to help you initialize a creative work, but it’s your job as a director to communicate changes you want made in order to get them done. If your goal is to use AI to bring your visions to life, take time to learn the essential design and art theory language, so that you can communicate these changes in precise ways. It’s no different from working with a human team in the real world and your work will greatly benefit from it overall.

Multimodal by Bakz T. Future

Discussion about this post