New Video: What is DALL-E? (Series Intro) [GPT-X, DALL-E, and our Multimodal Future]

Aug 31, 2021

What will the next 10 years of creativity look like? This is a question which first popped into my head the moment I used GPT-3 back in July 2020. What followed was a series of bike rides and long walks where I formulated and conceptualized what GPT-3 could mean for creatives everywhere. It was no surprise that GPT-3 had immediately began to see serious adoption amongst copywriters, science fiction authors, poets, and more. I felt a personal connection to this topic and felt the line of thinking I was already on could be something much greater than anyone else had realized.

Earlier this year, OpenAI announced DALL-E, which could generate images just with a text description … this further got my mental gears going as I began to formulate a clear vision of what the future of creativity could look like thanks to GPT-3 and Multimodal AI models like DALL-E. Eventually, all the ideas that had been bubbling up in my head reached some kind of equilibrium where the future (the years 2023-2031) felt very visceral and obvious to me. I could foresee some kind of larger system, new kinds of opportunities that don’t exist for creatives today, and I noticed the patterns of today that could indicate much larger trends and outcomes in the future.

Not much longer after that, I found these ideas had a magnetism of their own and were calling me to make something greater of them. GPT-X, DALL-E, and our Multimodal Future is my attempt to crystallize my ideas about the future of AI and Creativity.

I chose to turn the ideas into a YouTube series because I felt it would give them the greatest opportunity to spread and enter the public zeitgeist. I want creatives but also AI researchers, company executives, startup employees, and technology builders everywhere to watch this series. What has always seemed obvious to me - technology follows the direction and vision set by artists. Reality mirrors art. GPT-X, DALL-E, and our Multimodal Future is my attempt to set the bar for creative software and creative life in the future. By creating a series which feels real, possible, and tangible my hope is that others will take its ideas to greater lengths and make the series our actual shared reality, which would benefit creatives forever.

At the same time, I’m also excited about tech culture influencing artists. We all know about how Moore’s law influences the tech industry, but I actually believe this series is also about accelerating the rate of artistic output and innovation. The spirit of the artist, combined with better AI technology, could accelerate new kinds of artistic experiences we simply cannot conceptualize today. I also have faith that this idea I have - of artistic acceleration - can create the kind of futuristic, societal Utopia we all seek as a species.

This series attempts to visit the future and bring you a sampling of the important, relevant ideas to help you succeed in all of your creative endeavours this decade and beyond. It has also changed me as a person. The spirit of the artist is an essential ingredient to creating a new, better world. I’ve begun to identify myself more as an artist than a technologist. This series has also helped me realize that our current exististance is quite limiting. What great works would I be creating, if I already had access to these tools? What great stories have never been told? How many versions of DALL-E are we away from the true masterpieces of our era?

I took four months off of work to create this series full time. I’ve never heard of someone taking time off to make YouTube videos like this so drastically and I didn’t even know how long it could take when I got started. I told everyone around me that, “I’m taking time off and I’m not sure when or if I’ll be back”. It was kind of like a creative sabbatical, but a lot less relaxing …. it ended up being a very intense, painful, but still highly productive period in my life. I don’t know anybody else on YouTube even doing series/playlists either … this is an entirely new format (I think). But I just could not come up with any other way to share these ideas which suited its content and also worked for me.

I’m excited to be some kind of a liaison or ambassador to these ideas. I still don’t know the source of inspiration or where ideas comes from, but I understand my role as someone driving and distributing them to anyone who will listen. I’m just as excited to hear what others think and connect with others over these important topics. Which is why in the last video, I will share details to a public clubhouse event that anyone can join, so that we can discuss the ideas in the series further in a live format.

This intro video will introduce you the surface level ideas we will be exploring in the series. It’s also a teaser of what to come this month.

I’d like to personally welcome you to GPT-X, DALL-E, and our Multimodal Future!

YouTube Transcript (Spoilers ahead)

What will the next 10 years of creativity look like? Is it possible to create your best work ever and be creatively fulfilled? Do you wish to stay relevant and continue creating as the world changes dramatically over the next few years?

Well, I have spent the last 4 months creating 19 videos to address these questions.

I think creativity will be heavily driven by Artificial Intelligence. You may have heard of GPT-3, made by a company called OpenAI, which can generate text … but, they also announced another model called DALL-E, which can generate images just with text descriptions. Take a look at some of its results. You don’t need to code or teach it, you just describe the image, and it makes it for you. It’s really that simple.

DALL-E is what’s known as a multimodal model. It deals with two modalities: images and text. But multimodal models can theoretically be trained on anything. Imagine a single model which can take in images, video, audio, any kind of text, sensory data, and produce any kind of media you could imagine.

Just with a text description of what you want, Multimodal models could theoretically generate a painting, your music album cover, product design, an entire movie, your next great song, or even the architectural design for a building. At the same time, you could even cross between them. Imagine creating a song by only giving an AI model a specific painting. Imagine generating a building architecture design just based on a movie you liked. The creatives possibilities through Multimodal AI are endless. Which is why I made this series.

This series is specifically about future versions of GPT-3, multimodal AI models like DALL-E, and where these models converge and intersect with the dimension of human creativity.

In this series, I will share important creative lessons for multimodal creatives of the future and describe some of their characteristics I can imagine. I will be sharing strategies I can foresee to help you compete and stand out from every other creative in the future. I’ll talk about how you could even make money as a creative through multimodal AI and the industrial shifts we’ll see as a result of these models. This series even attempts to predict the impacts that multimodal AI will have on our broader society and I will also try to even answer some of the larger philosophical questions you may have about the ethics of multimodal creativity and even the nature of creativity itself.

Creativity will for sure evolve this decade, AI is accelerating.

This series is different from anything I’ve ever made before. It is a departure from the here and now. We are leaving today and exploring the next 2-10 years of Creativity & AI. Today, I’m asking you to take a leap of faith with me and please keep an open mind as we are discussing the future.

I’m not saying everything will happen right away or even at all, there are commercial limitations which will need to be addressed. Like superresolution, inference speed/and latency, the computational demands of generating intensive things like videos, copyright infringement, AI alignment, and of course making the AI tech access available to everyone in the world.

I want to be clear. DALL-E is currently not available to anyone … even though I have been making this series, even I haven’t even tried it. But that’s not the point here. The point is, every once in a while, you get a taste of the future, you can look around the corner and see what’s roughly on the other side. It’s so real you can feel viscerally feel it, that’s what all of this is about.

Throughout this series, I will be using a program called, “The Multimodal Photo Editor”. This is a theoretical windows program that is possible even with today’s technology ... think of it as a futuristic version of Adobe Photoshop. I created it just for this series to help teach you some of the creative lessons I can imagine for the future. Even though it's an image program I'll be using, no matter your creative discipline whether it's architecture or product design, this series is relevant, applicable, and made with all of the creative disciplines in mind.

Anyways, for the next 18 days, I will be releasing a video everyday. So, I know it’s common for YouTubers to ask, but I really mean it this time. If any of this stuff interests you, I’m asking you to please make sure you are subscribed and have notifications on so you can know when a new video comes out everyday over the next month or so.

In the very last video, I will even be sharing the link to an exclusive clubhouse event where you can join me as well as others who watched the series, and we can discuss all the ideas in it and I will try to answer any remaining questions you may have.

With all that out of the way … buckle in, I hope you’re ready, because this series is an experience of its own. I’d like to personally welcome you to GPT-X, DALL-E, and our Multimodal Future. See you on the other side!

Multimodal by Bakz T. Future

Discussion about this post