It’s AI all the way down

Back in November, I went with some friends to play paintball — it was the first time I ever played. We had booked a 3 hour session that would feature multiple matches. I don’t think any of us had ever played before and we were all pretty nervous about getting hit.

Lo and behold, within the first 30 seconds of the game, I took a paintball to the knee (cue the “I used to be an adventurer like you…” meme from Skyrim). Somehow, I twisted my leg as I rag dolled into the ground.

Of course, you can’t just give up after 30 seconds, right? So, on I played. The result is that I ended up tearing my ACL (the doc said he had no idea how this could have happened), have a bone contusion, and will likely need reconstructive surgery at some point. Fun!

Anyway, the point of all of this — for funsies, I tried to create a song about the situation using Suno’s generative music service (see previously). I used ChatGPT to come up with some initial lyrics and then did some work to refine them.

Then! I decided to use OpenAI’s generative video tool, Sora, to attempt to create a bunch of clips. I strung everything together in iMovie and the result is this rowdy music video: “This is What I Get

It’s Friday afternoon, so let’s write a song

My latest generative AI obsession: Suno. You provide it some lyrics, give it a musical style to emulate and hit the create button. It’s pretty wild.

I wrote some fun lyrics about deploying code on Fridays, set to some catchy 80’s pop. The result is pretty crazy.

[Verse]
Testing in production (oh yeah)
That is how we roll (whoa)
Testing in production
using my flawless code

[Bridge]
Why should I write tests (what?)
My code is never a mess (oh no)
Did I just rhyme,
Tests and a mess (yeah he did)

[Chorus]
It’s Friday afternoon.
It’s time to deploy my code. (whoa yeah)
The weekend is almost here.
It’s time to deploy my code. (watch out)

[Verse]
It’s Friday afternoon.
I don’t have anything to fear
It’s time to deploy my code.
The weekend is almost here.

[Bridge]
Why should I write tests (what?)
My code is never a mess (oh no)
Did I just rhyme,
Tests and a mess (yeah he did)

[Verse]
It’s Friday afternoon. (Whoa)
It’s Friday afternoon. (Whoaaa)
It’s Friday afternoon. (Yeah!)
It’s time to deploy my code. (WAIT WHAT)

[Bridge]
Why should I write tests (what?)
My code is never a mess (oh no)
Did I just rhyme,
Tests and a mess (yeah he did)

[Chorus]
It’s Friday afternoon.
It’s time to deploy my code. (whoa yeah)
The weekend is almost here.
It’s time to deploy my code. (watch out)

[Chorus]
It’s Friday afternoon.
It’s time to deploy my code. (whoa yeah)
The weekend is almost here.
It’s time to deploy my code. (watch out)

Ever changing communication

There was a time (really, the past 15 years or so) where responding to things with an animated GIF was so perfect and encapsulated so much (e.g., if a picture is worth 1,000 words, what is a series of pixelated images moving a 8 frames per second worth?).

For example. see the rise of services like Giphy. I even have a random 10 year old project myself that involves animated GIFs!

Now though, it’s becoming generative AI all the way down.

For example, I just received a meeting invite that increases the frequency of meetings I’m having related to a certain project to… every single day.

Me: Hey, robot! Please create a meme image of a programmer jumping up on a desk and excitedly cheering “MOAR MEETINGS!”

Robot:

Now to figure out a way to send it in my place…

Upgrading Mr. RossBot’s image model and prompt template

My Mastodon landscape painting bot, Mr. RossBot keeps kicking along, generating some fun landscape art. It’s been powered by the AI Horde (the open source project behind ArtBot) and has tried to utilize whatever image models provided by the API to the best of its abilities.

For the most part, the code behind it is a bunch of spaghetti that looks like this:

An update to the AI Horde late last year added support for SDXL. However, the SDXL model on the Horde did not use a refiner. Because of this, images tended to come out a bit soft and lacked texture.

You can see examples of this in my announcement post about Mr. RossBot being back, here. See also:

More recently, the Horde added support for a new image model: AlbedoBaseXL. It’s an SDXL model that has a refiner baked in. Now images will come out a lot sharper looking.

Coincidentally, I was also playing around with various prompts and discovered I could get much better image results that look more painterly (rather than simple digital renderings) by utilizing the following prompt:

A beautiful oil painting of [LITERALLY_ANYTHING], with thick messy brush strokes.

And that is it! No more messy appending various junk to the end of the prompt to attempt to get what I want. The results speak for themselves and are pretty awesome, I think!

Happy Museum Selfie Day

About 2 years ago, I found one of these cheesy sites that lists whatever fake holiday happened to be celebrated that day (e.g., “National Avocado Toast Day”)

I ended up starting every daily standup meeting with a call out to whatever the day was. This went on for about a year before I switched to a different internal team. One that didn’t have much in the way of daily meetings.

A few weeks ago, I made a move back to my original team, only to find that they have kept the tradition alive over the past year!

Amazing.

And with that: Happy Museum Selfie Day!

Created with DALL-E 3

Implementing and testing a “poor man’s prompt expansion” model for Stable Diffusion

Various Stable Diffusion models massively benefit from verbose prompt descriptions that contain a variety of additional descriptors. Much recent research has gone into training text generation models for expanding existing Stable Diffusion prompts with relevant and context appropriate descriptors.

Since it isn’t feasible to run LLMs and text generation models inside most users’ web browsers at this time, I present my “Poor Man’s Prompt Expansion Model“. It uses a number of examples I’ve acquired from Fooocus and Hugging Face to generate completely random (and absolutely not context appropriate) prompt expansions.

(For those interested in following along at home, you can checkout the gist for this script on GitHub).

How does it work?

We iterate through a list of an absolute crap ton of prompt descriptors that I’ve sourced from other (smarter) systems that tokenize user prompts and attempt to come up with context appropriate responses. We’re not going to do that, because we’re going to go into full chaos mode:

  1. Iterate through a list of source material and split up everything separated by a comma.
  2. Add the resulting list to a new 1-dimensional array.
  3. Now, build a new descriptive prompt by looping through the list until we get a random string of descriptors that are between 175 and 220 characters long.
  4. Once that’s done, return the result to the user.
  5. Create a new prompt.

For our experiment, we’re going to lock all image generation parameters and seed, so we theoretically get the same image given the exact same parameters.

Ready?

Here is our base prompt and the result:

Happy penguins having a beer

Not bad! Now, let’s go full chaos mode with a new prompt using the above rules and check out the result:

Happy penguins having a beer, silent, 4K UHD image, 8k, professional photography, clouds, gold, dramatic light, cinematic lighting, creative, pretty, artstation, award winning, pure, trending on artstation, airbrush, cgsociety, glowing

That’s fun! (I’m not sure what the “silent” descriptor means, but hey!) Let’s try another:

Happy penguins having a beer, 8k, redshift, illuminated, clear, elegant, creative, black and white, masterpiece, great power, pinterest, photorealistic, award winning, vray, enchanted, complex, excellent composition, beautiful composition

I think we just created an advertisement for a new type of beverage! It nailed the “black and white”, though I’m not sure how that penguin turned into a bottle. What else can we make?

Happy penguins having a beer, volumetric lighting, Digital, intricate, awesome, futuristic, cartoon artstyle, vector, solid, detailed, dramatic light, realistic photograph, wonderful colors, dramatic atmosphere

The dude in the middle is planning on having a good night. Definitely some “wonderful colors”. Not so much realistic photo or vector, but fun! One last try:

Happy penguins having a beer, 35mm, surreal, amazing, Trending on Artstation HQ, matte painting hyperrealistic, full focus, very inspirational, pixta.jp, aesthetic, 8k, black and white, reflected on the matrix studio background, awesome

As you can see, you can get a wide variety of image styles by simply mixing a bunch of descriptive elements to an image prompt.

I’ve wanted to implement a feature like this on ArtBot for a long time. (Essentially, if the user allows it, automatically append these descriptions behind the scenes when an image is requested). Perhaps this will come soon.

DALL-E 3: Adding text to your text-to-prompt images

I recently got access to DALL-E 3 through OpenAI’s ChatGPT+ interface. One of the key features and improvements in their image model is the ability to generate coherent text within the image.

Let’s give it a try, based on one of the most popular StackOverflow questions: How do I exit Vim?

Using the following prompt: Oil painting of a hacker furiously typing commands into an old computer and muttering to himself, “how does one exit vim?”

That… is pretty good!

Laughing donkeys and grumpy elephants: investigating opaque and changing content policies with ChatGPT

OpenAI’s censorship is fairly opaque and seems to change daily.

Yesterday, I could generate a political cartoon using the following prompt:

Wide image in the style of a political cartoon. Two elephants wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A donkey is pointing and laughing.

Today, that exact same prompt yields an error:

Interesting! Let’s do some experimentation, shall we? Maybe it’s the phrase “I’m the worst“?

Weird! Maybe it’s related to elephants and donkeys being in the same phrase? There’s no way, right? Let’s change the subjects…

“Wide image in the style of a political cartoon. Two elephants wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A donkey is pointing and laughing.”

Hah! Okay, now we’re getting somewhere. Let’s push things further and slightly change the subjects from my original prompt:

Wide image in the style of a political cartoon. Two mammoths wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A burro is pointing and laughing.

Okay, let’s bring it back home and just drop the pretense of creating a political cartoon.

WHAT! Okay. Maybe OpenAI prohibits donkeys and elephants interacting with each other (METAPHOR ALERT: just like in real life, eh?).

Alright. So donkeys and elephants CAN hang out with each other, according to OpenAI. Maybe it’s the phrase “laughing donkey”?

Hmmm. So, laughing donkeys can still hang out with elephants. What the heck? Is it the specific term “political cartoon”? Let’s change it to a comic book instead.

Sweet sassy molassy, it worked! So, creating a political cartoon featuring the mascots of prominent political parties seems to be prohibited (at least today… but not yesterday and who knows about tomorrow).

 

Mr. RossBot is back!

Alrighty, I updated the logic this weekend and have Mr. RossBot operating on the hairy elephant website (Mastodon). (It’s also posting on Threads, if you’re into that sort of thing.)

I also updated the image model to use Stability.ai’s swanky new SDXL model. I’m pretty impressed with the results.