Category: art

“Paint me like one of your French… err… humans.”

Earlier this year, OpenAI added memory to ChatGPT. If it thought something you discussed was notable, it would add that to its memory bank in order to make future conversations better.

If you’re a ChatGPT user and haven’t looked through what ChatGPT has added to its memory yet, do it! It’s wild. Here is a small sample of the things that it has remembered from various conversations.

Anyway, I decided to ask it: “Based on everything we’ve chatted about, can you create an image of what you think I look like? I want you to use your own creativity and insight into things we’ve talked about that might give you an idea of what I look like. And then create an image based on this.”

And it went to work!

Haha! I wish. I asked what the prompt for that image was:

“A creative depiction of a senior software engineer with a thoughtful and analytical demeanor. The person is sitting at a modern workspace, surrounded by elements representing their interests such as a laptop with coding screens, a bookshelf with a mix of technical books and sci-fi novels, a Betta fish tank on the desk, and a coffee mug. The individual has an approachable and intellectual vibe, with a relaxed yet focused expression. Their workspace is neat but personalized, reflecting creativity and problem-solving. The background includes subtle references to hobbies like running gear, a small display of game design sketches, and tools for crafting AI-related projects. Natural lighting with a modern and clean aesthetic.”

“Their workspace is neat…” — oh boy, if only you knew. Regardless, I’ll take it. Sure!

Ever changing communication

There was a time (really, the past 15 years or so) where responding to things with an animated GIF was so perfect and encapsulated so much (e.g., if a picture is worth 1,000 words, what is a series of pixelated images moving a 8 frames per second worth?).

For example. see the rise of services like Giphy. I even have a random 10 year old project myself that involves animated GIFs!

Now though, it’s becoming generative AI all the way down.

For example, I just received a meeting invite that increases the frequency of meetings I’m having related to a certain project to… every single day.

Me: Hey, robot! Please create a meme image of a programmer jumping up on a desk and excitedly cheering “MOAR MEETINGS!”

Robot:

Now to figure out a way to send it in my place…

Upgrading Mr. RossBot’s image model and prompt template

My Mastodon landscape painting bot, Mr. RossBot keeps kicking along, generating some fun landscape art. It’s been powered by the AI Horde (the open source project behind ArtBot) and has tried to utilize whatever image models provided by the API to the best of its abilities.

For the most part, the code behind it is a bunch of spaghetti that looks like this:

An update to the AI Horde late last year added support for SDXL. However, the SDXL model on the Horde did not use a refiner. Because of this, images tended to come out a bit soft and lacked texture.

You can see examples of this in my announcement post about Mr. RossBot being back, here. See also:

More recently, the Horde added support for a new image model: AlbedoBaseXL. It’s an SDXL model that has a refiner baked in. Now images will come out a lot sharper looking.

Coincidentally, I was also playing around with various prompts and discovered I could get much better image results that look more painterly (rather than simple digital renderings) by utilizing the following prompt:

A beautiful oil painting of [LITERALLY_ANYTHING], with thick messy brush strokes.

And that is it! No more messy appending various junk to the end of the prompt to attempt to get what I want. The results speak for themselves and are pretty awesome, I think!

Implementing and testing a “poor man’s prompt expansion” model for Stable Diffusion

Various Stable Diffusion models massively benefit from verbose prompt descriptions that contain a variety of additional descriptors. Much recent research has gone into training text generation models for expanding existing Stable Diffusion prompts with relevant and context appropriate descriptors.

Since it isn’t feasible to run LLMs and text generation models inside most users’ web browsers at this time, I present my “Poor Man’s Prompt Expansion Model“. It uses a number of examples I’ve acquired from Fooocus and Hugging Face to generate completely random (and absolutely not context appropriate) prompt expansions.

(For those interested in following along at home, you can checkout the gist for this script on GitHub).

How does it work?

We iterate through a list of an absolute crap ton of prompt descriptors that I’ve sourced from other (smarter) systems that tokenize user prompts and attempt to come up with context appropriate responses. We’re not going to do that, because we’re going to go into full chaos mode:

  1. Iterate through a list of source material and split up everything separated by a comma.
  2. Add the resulting list to a new 1-dimensional array.
  3. Now, build a new descriptive prompt by looping through the list until we get a random string of descriptors that are between 175 and 220 characters long.
  4. Once that’s done, return the result to the user.
  5. Create a new prompt.

For our experiment, we’re going to lock all image generation parameters and seed, so we theoretically get the same image given the exact same parameters.

Ready?

Here is our base prompt and the result:

Happy penguins having a beer

Not bad! Now, let’s go full chaos mode with a new prompt using the above rules and check out the result:

Happy penguins having a beer, silent, 4K UHD image, 8k, professional photography, clouds, gold, dramatic light, cinematic lighting, creative, pretty, artstation, award winning, pure, trending on artstation, airbrush, cgsociety, glowing

That’s fun! (I’m not sure what the “silent” descriptor means, but hey!) Let’s try another:

Happy penguins having a beer, 8k, redshift, illuminated, clear, elegant, creative, black and white, masterpiece, great power, pinterest, photorealistic, award winning, vray, enchanted, complex, excellent composition, beautiful composition

I think we just created an advertisement for a new type of beverage! It nailed the “black and white”, though I’m not sure how that penguin turned into a bottle. What else can we make?

Happy penguins having a beer, volumetric lighting, Digital, intricate, awesome, futuristic, cartoon artstyle, vector, solid, detailed, dramatic light, realistic photograph, wonderful colors, dramatic atmosphere

The dude in the middle is planning on having a good night. Definitely some “wonderful colors”. Not so much realistic photo or vector, but fun! One last try:

Happy penguins having a beer, 35mm, surreal, amazing, Trending on Artstation HQ, matte painting hyperrealistic, full focus, very inspirational, pixta.jp, aesthetic, 8k, black and white, reflected on the matrix studio background, awesome

As you can see, you can get a wide variety of image styles by simply mixing a bunch of descriptive elements to an image prompt.

I’ve wanted to implement a feature like this on ArtBot for a long time. (Essentially, if the user allows it, automatically append these descriptions behind the scenes when an image is requested). Perhaps this will come soon.

Laughing donkeys and grumpy elephants: investigating opaque and changing content policies with ChatGPT

OpenAI’s censorship is fairly opaque and seems to change daily.

Yesterday, I could generate a political cartoon using the following prompt:

Wide image in the style of a political cartoon. Two elephants wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A donkey is pointing and laughing.

Today, that exact same prompt yields an error:

Interesting! Let’s do some experimentation, shall we? Maybe it’s the phrase “I’m the worst“?

Weird! Maybe it’s related to elephants and donkeys being in the same phrase? There’s no way, right? Let’s change the subjects…

“Wide image in the style of a political cartoon. Two elephants wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A donkey is pointing and laughing.”

Hah! Okay, now we’re getting somewhere. Let’s push things further and slightly change the subjects from my original prompt:

Wide image in the style of a political cartoon. Two mammoths wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A burro is pointing and laughing.

Okay, let’s bring it back home and just drop the pretense of creating a political cartoon.

WHAT! Okay. Maybe OpenAI prohibits donkeys and elephants interacting with each other (METAPHOR ALERT: just like in real life, eh?).

Alright. So donkeys and elephants CAN hang out with each other, according to OpenAI. Maybe it’s the phrase “laughing donkey”?

Hmmm. So, laughing donkeys can still hang out with elephants. What the heck? Is it the specific term “political cartoon”? Let’s change it to a comic book instead.

Sweet sassy molassy, it worked! So, creating a political cartoon featuring the mascots of prominent political parties seems to be prohibited (at least today… but not yesterday and who knows about tomorrow).

 

ArtBot mentioned again in PC World!

ArtBot got another callout in PC World in the article: “The best AI art generators: Bring your wildest dreams to life.”

Though a bit of (fair) criticism at the end of the blurb though:

Why use Artbot? The vast number of AI models, and the variance in style those images produce. Otherwise, generating images via Artbot can be a bit of a crapshoot, and you may expend a great number of kudos simply exploring all the options. Since there’s no real setup besides figuring out the API key, Stable Horde (Artbot) can be worth a try.

Hey, I’ll take it!