New Video: Composition & Phrasing with Multimodal AI [GPT-X, DALL-E, and our Multimodal Future]

Sep 02, 2021

Composition & Phrasing was the first video I ever made in this series, I also believe it was the first script I ever wrote when I was formulating the larger ideas … I almost don’t want to let go of it. A lot of the series’ art style, themes, and formatting was crystallized in this video.

I had released a, “preview video” last month of this video early, this is the improved version with some of the necessary touch ups.

It’s on the topic of “Composition and Phrasing”. It deals with forcing yourself as a multimodal creative to mentally let go of the tiny details. The mindset of a “skilled creative labourer” is deeply ingrained mentally into our psyche nowadays as artists, we must let go of this mentality at times if we wish to truly leverage Multimodal AI models of the future to their full potential. This video is about looking at, “the big picture” and asking the important questions about your work. Don’t belittle your ideas by making little of them!

I hope you love it,

YouTube Transcript

In a world where multimodal AI models can generate entire graphics, stories, poems, or videos with just a simple text description, the role of a creator changes dramatically.

Say you’re working as a graphic designer drawing a simple photo of a lamp. In the past, you may have spent the bulk of your time focusing on each individual element and design decision - getting it, “just right”. Since you’re the one who has to make everything from scratch yourself, you would have been so focused on the tiniest details of your asset like shadows, typography, colour choices and more. But this attention to detail comes at great cost. How often have you got so caught up in the, “little things” to the point that your final asset doesn’t get the, “big things” right?

This is nothing new in the programming and startup world. Many developers suffer from something called, “tunnel vision” where you get so absorbed in your code, that you may end up building products nobody needs or even knows how to use. You may even lose your human social skills altogether.

File:Mark Zuckerberg - South by Southwest 2008.jpg - Wikimedia Commons

In artists terms, there’s something called, “composition”. Composition is about looking at the, “big picture” of your work and how different elements relate, it is found in every creative discipline like photography, graphic design, dance, and more. Musicians have a related sub-idea too called, “phrasing” which is all about not just writing bars or simple loops, but organizing musical sequences into structured “musical sentences” to communicate ideas to audiences at a high level.

Some techniques I currently use to improve my work, compositionally speaking, is to zoom out on images. It lets me see fewer details and forces me to look at a picture overall. Going back to the Multimodal Photo Editor app concept - in the future, I can imagine using a program to not only create and generate an image, but also have it analyze and give feedback on my composition too, dramatically improving the quality of my work.

By the way, one of my best moments for creative improvement is when I finish something and show it to a mentor. A good one will not spend their time on the tiny details, but look at my work overall and give high level compositional suggestions, which often lead to a bigger idea, wider potential audience, or greater emotional appeal.

In our multimodal-based future, since AI has the skills and can generate work for us, we may not need to spend our time sweating the small stuff. Instead, we may get to actually spend more time on, “the big picture stuff”. Which I think is a fundamentally good thing for the creative process.

The Key Idea

Don’t miss the forest for the trees. With GPT-3, DALL-E and other multimodal AI models in the future, your work can have a greater systems design, flow and transition better, and have a heavier emotional impact on the world. Get comfortable having AI do the skilled work. Learn to sit back, make iterative suggestions, and focus your attention on the big picture.

Multimodal by Bakz T. Future

Discussion about this post