Implementing and testing a “poor man’s prompt expansion” model for Stable Diffusion

Various Stable Diffusion models massively benefit from verbose prompt descriptions that contain a variety of additional descriptors. Much recent research has gone into training text generation models for expanding existing Stable Diffusion prompts with relevant and context appropriate descriptors.

Since it isn’t feasible to run LLMs and text generation models inside most users’ web browsers at this time, I present my “Poor Man’s Prompt Expansion Model“. It uses a number of examples I’ve acquired from Fooocus and Hugging Face to generate completely random (and absolutely not context appropriate) prompt expansions.

(For those interested in following along at home, you can checkout the gist for this script on GitHub).

How does it work?

We iterate through a list of an absolute crap ton of prompt descriptors that I’ve sourced from other (smarter) systems that tokenize user prompts and attempt to come up with context appropriate responses. We’re not going to do that, because we’re going to go into full chaos mode:

  1. Iterate through a list of source material and split up everything separated by a comma.
  2. Add the resulting list to a new 1-dimensional array.
  3. Now, build a new descriptive prompt by looping through the list until we get a random string of descriptors that are between 175 and 220 characters long.
  4. Once that’s done, return the result to the user.
  5. Create a new prompt.

For our experiment, we’re going to lock all image generation parameters and seed, so we theoretically get the same image given the exact same parameters.

Ready?

Here is our base prompt and the result:

Happy penguins having a beer

Not bad! Now, let’s go full chaos mode with a new prompt using the above rules and check out the result:

Happy penguins having a beer, silent, 4K UHD image, 8k, professional photography, clouds, gold, dramatic light, cinematic lighting, creative, pretty, artstation, award winning, pure, trending on artstation, airbrush, cgsociety, glowing

That’s fun! (I’m not sure what the “silent” descriptor means, but hey!) Let’s try another:

Happy penguins having a beer, 8k, redshift, illuminated, clear, elegant, creative, black and white, masterpiece, great power, pinterest, photorealistic, award winning, vray, enchanted, complex, excellent composition, beautiful composition

I think we just created an advertisement for a new type of beverage! It nailed the “black and white”, though I’m not sure how that penguin turned into a bottle. What else can we make?

Happy penguins having a beer, volumetric lighting, Digital, intricate, awesome, futuristic, cartoon artstyle, vector, solid, detailed, dramatic light, realistic photograph, wonderful colors, dramatic atmosphere

The dude in the middle is planning on having a good night. Definitely some “wonderful colors”. Not so much realistic photo or vector, but fun! One last try:

Happy penguins having a beer, 35mm, surreal, amazing, Trending on Artstation HQ, matte painting hyperrealistic, full focus, very inspirational, pixta.jp, aesthetic, 8k, black and white, reflected on the matrix studio background, awesome

As you can see, you can get a wide variety of image styles by simply mixing a bunch of descriptive elements to an image prompt.

I’ve wanted to implement a feature like this on ArtBot for a long time. (Essentially, if the user allows it, automatically append these descriptions behind the scenes when an image is requested). Perhaps this will come soon.

Banned from Facebook Marketplace without a reason and without recourse

As much as technology improves our lives (and is integrated into literally everything we do), it really fucking sucks when the algorithm gets it wrong.

Earlier this summer, I posted a shop vac for sale, as I’ve done a number of times before (err, posting things for sale, not specifically shop vacs).

Soon after, I was banned for “violating community standards.” I have literally no idea what happened. But! Apparently you could appeal the decision if you felt it was incorrect.

So I did.

And was rejected.

So I appealed again.

And was rejected.

I appealed again. And now it looks like I am permanently banned from Facebook Marketplace. And there’s no way to appeal the decision. No way to contact customer support. Cool.

 

Anyway, here’s an image of Mark Zuckerberg wearing clown makeup, created using Stable Diffusion.

ArtBot mentioned again in PC World!

ArtBot got another callout in PC World in the article: “The best AI art generators: Bring your wildest dreams to life.”

Though a bit of (fair) criticism at the end of the blurb though:

Why use Artbot? The vast number of AI models, and the variance in style those images produce. Otherwise, generating images via Artbot can be a bit of a crapshoot, and you may expend a great number of kudos simply exploring all the options. Since there’s no real setup besides figuring out the API key, Stable Horde (Artbot) can be worth a try.

Hey, I’ll take it!

ArtBot written up in PC World!

Hah! This is pretty awesome. My nifty side project, ArtBot, has been written up in PC World as part of a larger article about Stable Horde (the open source backend that powers my web app):

Stable Horde has a few front-end interfaces to use to create AI art, but my preferred choice is ArtBot, which taps into the Horde. (There’s also a separate client interface, with either a Web version or downloadable software.)

Interestingly enough, ArtBot just passed 2,000,000 images generated!

Woe is Twitter…

To the tune of R.E.M’s “End of the World”:

“It’s the end of the (Twitter) as we know it, and I feel fiiiiiiinnnnneeee!” via… me.

I don’t have high hopes for the future of Twitter, pending Elon’s acquisition. It’s a service I’ve long loved, been frustrated with, but also found immense value in.

I’ve gotten jobs because of it, made new friends because of it, learned a lot because of it. Granted, it’s gotten much more toxic and I long for the days when it was fun.

But I don’t think having this service in control of a self-proclaimed internet troll who has lurched evermore rightward is going to improve things. Alas.

Punk Rock Obama

I think it’s time to end my AI art career on this high note. Generated with Stable Diffusion, running on my local machine.

The prompt:
“beautiful portrait painting of Barack Obama with a purple mohawk on top of his head shredding on an electric guitar at a punk rock show, concept art, makoto shinkai, takashi takeuchi, trending on artstation, 8k, very sharp, extremely detailed, volumetric, beautiful lighting, wet-on-wet”

Punk Rock Obama

MidJourney – AI Art Madness

A few short weeks ago, I had downloaded a simplified model for generating AI-created images on your local machine. The internet (myself included) had a lot of fun with it, but the quality was definitely lacking, especially when compared to the more serious AI image platforms being created by some big companies.

I recently received my invite to the MidJourney beta and I am just blown away!

For now, I’ve just been putting in ridiculous prompts that simulate styles for various artists (oh, man. I have a feeling this is going to piss off a lot of artists in the future…)

For example: “Apocalyptic wasteland with crumbling buildings and debris, thomas kinkade painting”

The potential here is pretty crazy — for people who aren’t artistically inclined, they can start generating images and scenes based on what they come up with. Some people can probably use this as a base to get to rapidly start iterating on new ideas. And of course, others are going to be mad.

A lot of the detail in creating these images is how you create the prompt. You’re already seeing the phrase “prompt engineering” being used in various places — check out this Twitter search.

For me though, I’m excited about this new technology and it’s something I’ve been eager to play with.