art - Dave Schumaker

OpenAI’s new image generation models are… insane

March 26, 2025

You can probably repeat this blog post headline for any given service every week at this point…

Anyway! I’ve been on board the generative AI train for a few years now and it’s amazing to see how far it’s come. In October 2023, I got access to DALL-E 3 and was pretty impressed with its ability to render text.

Yesterday, OpenAI announced 4o Image Generation and boy does it kick things up a notch or two!

It’s ability to generate images and render text according to your exact prompt is incredible. We can now have full on automated AI memebots.

A four panel cartoon strip

first panel: a software engineer sitting in front of a computer screen on a Zoom meeting

second panel: the software engineer tells the participants (with a speech bubble): “I’m telling you, AI is coming for our jobs!”

third panel: we just see a slight closeup of the software engineer (the computer monitor isn’t visible)

fourth panel: same as the first panel except all the participants are now robots

Same angle and setup in every panel, reduced art style, broad outlines

Or, how about:

Cartoon drawing of a bored computer programmer sitting in front of a computer just pressed “enter” over and over. He is sarcastically excited and says, “Vibe coding. Wooooo.”

You can also feed it source images and it will run with it as well. So, obviously we need to use the Canine Calibration System.

I even gave it an image of me and told it to make a movie poster:

Create a dramatic cyberpunk 1980s horror movie poster image featuring a Computer Monster (We see an LCD screen with evil eyes and fangs and it has robotic legs) in a dark alley. In front of the monster, we see the man in this source image passed out on the ground, broken glasses lay next to him. At the top of the poster is the title of the movie in digital writing: “BUFFER OVERFLOW” at the bottom in the billing area, we see text that says, “Some bugs were never meant to be fixed.”

Or rewrite history…

Or really, really rewrite history…

It’s just wild. It’s coming for us as engineers, as musicians, as artists, as writers. This 2024 post on Twitter sums it up:

You know what the biggest problem with pushing all-things-AI is? Wrong direction. I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes.

– Joanna Maciejewska on Twitter

Hmm, this sounds like a 4-panel comic to me!

“Paint me like one of your French… err… humans.”

November 18, 2024

Earlier this year, OpenAI added memory to ChatGPT. If it thought something you discussed was notable, it would add that to its memory bank in order to make future conversations better.

If you’re a ChatGPT user and haven’t looked through what ChatGPT has added to its memory yet, do it! It’s wild. Here is a small sample of the things that it has remembered from various conversations.

Anyway, I decided to ask it: “Based on everything we’ve chatted about, can you create an image of what you think I look like? I want you to use your own creativity and insight into things we’ve talked about that might give you an idea of what I look like. And then create an image based on this.”

And it went to work!

Haha! I wish. I asked what the prompt for that image was:

“A creative depiction of a senior software engineer with a thoughtful and analytical demeanor. The person is sitting at a modern workspace, surrounded by elements representing their interests such as a laptop with coding screens, a bookshelf with a mix of technical books and sci-fi novels, a Betta fish tank on the desk, and a coffee mug. The individual has an approachable and intellectual vibe, with a relaxed yet focused expression. Their workspace is neat but personalized, reflecting creativity and problem-solving. The background includes subtle references to hobbies like running gear, a small display of game design sketches, and tools for crafting AI-related projects. Natural lighting with a modern and clean aesthetic.”

“Their workspace is neat…” — oh boy, if only you knew. Regardless, I’ll take it. Sure!

Ever changing communication

March 18, 2024

There was a time (really, the past 15 years or so) where responding to things with an animated GIF was so perfect and encapsulated so much (e.g., if a picture is worth 1,000 words, what is a series of pixelated images moving a 8 frames per second worth?).

For example. see the rise of services like Giphy. I even have a random 10 year old project myself that involves animated GIFs!

Now though, it’s becoming generative AI all the way down.

For example, I just received a meeting invite that increases the frequency of meetings I’m having related to a certain project to… every single day.

Me: Hey, robot! Please create a meme image of a programmer jumping up on a desk and excitedly cheering “MOAR MEETINGS!”

Robot:

Now to figure out a way to send it in my place…

Upgrading Mr. RossBot’s image model and prompt template

January 18, 2024

My Mastodon landscape painting bot, Mr. RossBot keeps kicking along, generating some fun landscape art. It’s been powered by the AI Horde (the open source project behind ArtBot) and has tried to utilize whatever image models provided by the API to the best of its abilities.

For the most part, the code behind it is a bunch of spaghetti that looks like this:

An update to the AI Horde late last year added support for SDXL. However, the SDXL model on the Horde did not use a refiner. Because of this, images tended to come out a bit soft and lacked texture.

You can see examples of this in my announcement post about Mr. RossBot being back, here. See also:

More recently, the Horde added support for a new image model: AlbedoBaseXL. It’s an SDXL model that has a refiner baked in. Now images will come out a lot sharper looking.

Coincidentally, I was also playing around with various prompts and discovered I could get much better image results that look more painterly (rather than simple digital renderings) by utilizing the following prompt:

A beautiful oil painting of [LITERALLY_ANYTHING], with thick messy brush strokes.

And that is it! No more messy appending various junk to the end of the prompt to attempt to get what I want. The results speak for themselves and are pretty awesome, I think!

Implementing and testing a “poor man’s prompt expansion” model for Stable Diffusion

January 12, 2024

Various Stable Diffusion models massively benefit from verbose prompt descriptions that contain a variety of additional descriptors. Much recent research has gone into training text generation models for expanding existing Stable Diffusion prompts with relevant and context appropriate descriptors.

Since it isn’t feasible to run LLMs and text generation models inside most users’ web browsers at this time, I present my “Poor Man’s Prompt Expansion Model“. It uses a number of examples I’ve acquired from Fooocus and Hugging Face to generate completely random (and absolutely not context appropriate) prompt expansions.

(For those interested in following along at home, you can checkout the gist for this script on GitHub).

How does it work?

We iterate through a list of an absolute crap ton of prompt descriptors that I’ve sourced from other (smarter) systems that tokenize user prompts and attempt to come up with context appropriate responses. We’re not going to do that, because we’re going to go into full chaos mode:

Iterate through a list of source material and split up everything separated by a comma.
Add the resulting list to a new 1-dimensional array.
Now, build a new descriptive prompt by looping through the list until we get a random string of descriptors that are between 175 and 220 characters long.
Once that’s done, return the result to the user.
Create a new prompt.

For our experiment, we’re going to lock all image generation parameters and seed, so we theoretically get the same image given the exact same parameters.

Ready?

Here is our base prompt and the result:

Happy penguins having a beer

Not bad! Now, let’s go full chaos mode with a new prompt using the above rules and check out the result:

Happy penguins having a beer, silent, 4K UHD image, 8k, professional photography, clouds, gold, dramatic light, cinematic lighting, creative, pretty, artstation, award winning, pure, trending on artstation, airbrush, cgsociety, glowing

That’s fun! (I’m not sure what the “silent” descriptor means, but hey!) Let’s try another:

Happy penguins having a beer, 8k, redshift, illuminated, clear, elegant, creative, black and white, masterpiece, great power, pinterest, photorealistic, award winning, vray, enchanted, complex, excellent composition, beautiful composition

I think we just created an advertisement for a new type of beverage! It nailed the “black and white”, though I’m not sure how that penguin turned into a bottle. What else can we make?

Happy penguins having a beer, volumetric lighting, Digital, intricate, awesome, futuristic, cartoon artstyle, vector, solid, detailed, dramatic light, realistic photograph, wonderful colors, dramatic atmosphere

The dude in the middle is planning on having a good night. Definitely some “wonderful colors”. Not so much realistic photo or vector, but fun! One last try:

Happy penguins having a beer, 35mm, surreal, amazing, Trending on Artstation HQ, matte painting hyperrealistic, full focus, very inspirational, pixta.jp, aesthetic, 8k, black and white, reflected on the matrix studio background, awesome

As you can see, you can get a wide variety of image styles by simply mixing a bunch of descriptive elements to an image prompt.

I’ve wanted to implement a feature like this on ArtBot for a long time. (Essentially, if the user allows it, automatically append these descriptions behind the scenes when an image is requested). Perhaps this will come soon.

Laughing donkeys and grumpy elephants: investigating opaque and changing content policies with ChatGPT

October 12, 2023

OpenAI’s censorship is fairly opaque and seems to change daily.

Yesterday, I could generate a political cartoon using the following prompt:

“Wide image in the style of a political cartoon. Two elephants wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A donkey is pointing and laughing.”

Today, that exact same prompt yields an error:

Interesting! Let’s do some experimentation, shall we? Maybe it’s the phrase “I’m the worst“?

Weird! Maybe it’s related to elephants and donkeys being in the same phrase? There’s no way, right? Let’s change the subjects…

“Wide image in the style of a political cartoon. Two elephants wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A donkey is pointing and laughing.”

Hah! Okay, now we’re getting somewhere. Let’s push things further and slightly change the subjects from my original prompt:

Wide image in the style of a political cartoon. Two mammoths wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A burro is pointing and laughing.

Okay, let’s bring it back home and just drop the pretense of creating a political cartoon.

WHAT! Okay. Maybe OpenAI prohibits donkeys and elephants interacting with each other (METAPHOR ALERT: just like in real life, eh?).

Alright. So donkeys and elephants CAN hang out with each other, according to OpenAI. Maybe it’s the phrase “laughing donkey”?

Hmmm. So, laughing donkeys can still hang out with elephants. What the heck? Is it the specific term “political cartoon”? Let’s change it to a comic book instead.

Sweet sassy molassy, it worked! So, creating a political cartoon featuring the mascots of prominent political parties seems to be prohibited (at least today… but not yesterday and who knows about tomorrow).

Mr. RossBot is back!

August 22, 2023

Alrighty, I updated the logic this weekend and have Mr. RossBot operating on the hairy elephant website (Mastodon). (It’s also posting on Threads, if you’re into that sort of thing.)

I also updated the image model to use Stability.ai’s swanky new SDXL model. I’m pretty impressed with the results.

10,000,000 images created with ArtBot!

August 9, 2023

Alright, this is pretty awesome! 🎉

Last night at 11:27PM, my little generative AI side project, ArtBot, crossed 10,000,000 images created. People have created some pretty cool stuff.

Some random stats from Google Analytics since it launched on October 9th, 2022 to now:

71,000 unique users
8.3 million page views.

Punk Rock ‘Yellen

July 10, 2023

I should just start a new series entitled, “Punk Rock Politicos”. This is a follow up to a previous image: Punk Rock Obama.

Here is our treasury secretary, Janet “Yelling” Yellen.

ArtBot mentioned again in PC World!

April 6, 2023

ArtBot got another callout in PC World in the article: “The best AI art generators: Bring your wildest dreams to life.”

Though a bit of (fair) criticism at the end of the blurb though:

Why use Artbot? The vast number of AI models, and the variance in style those images produce. Otherwise, generating images via Artbot can be a bit of a crapshoot, and you may expend a great number of kudos simply exploring all the options. Since there’s no real setup besides figuring out the API key, Stable Horde (Artbot) can be worth a try.

Hey, I’ll take it!

Category: art