Ever changing communication

There was a time (really, the past 15 years or so) where responding to things with an animated GIF was so perfect and encapsulated so much (e.g., if a picture is worth 1,000 words, what is a series of pixelated images moving a 8 frames per second worth?).

For example. see the rise of services like Giphy. I even have a random 10 year old project myself that involves animated GIFs!

Now though, it’s becoming generative AI all the way down.

For example, I just received a meeting invite that increases the frequency of meetings I’m having related to a certain project to… every single day.

Me: Hey, robot! Please create a meme image of a programmer jumping up on a desk and excitedly cheering “MOAR MEETINGS!”

Robot:

Now to figure out a way to send it in my place…

Upgrading Mr. RossBot’s image model and prompt template

My Mastodon landscape painting bot, Mr. RossBot keeps kicking along, generating some fun landscape art. It’s been powered by the AI Horde (the open source project behind ArtBot) and has tried to utilize whatever image models provided by the API to the best of its abilities.

For the most part, the code behind it is a bunch of spaghetti that looks like this:

An update to the AI Horde late last year added support for SDXL. However, the SDXL model on the Horde did not use a refiner. Because of this, images tended to come out a bit soft and lacked texture.

You can see examples of this in my announcement post about Mr. RossBot being back, here. See also:

More recently, the Horde added support for a new image model: AlbedoBaseXL. It’s an SDXL model that has a refiner baked in. Now images will come out a lot sharper looking.

Coincidentally, I was also playing around with various prompts and discovered I could get much better image results that look more painterly (rather than simple digital renderings) by utilizing the following prompt:

A beautiful oil painting of [LITERALLY_ANYTHING], with thick messy brush strokes.

And that is it! No more messy appending various junk to the end of the prompt to attempt to get what I want. The results speak for themselves and are pretty awesome, I think!

Implementing and testing a “poor man’s prompt expansion” model for Stable Diffusion

Various Stable Diffusion models massively benefit from verbose prompt descriptions that contain a variety of additional descriptors. Much recent research has gone into training text generation models for expanding existing Stable Diffusion prompts with relevant and context appropriate descriptors.

Since it isn’t feasible to run LLMs and text generation models inside most users’ web browsers at this time, I present my “Poor Man’s Prompt Expansion Model“. It uses a number of examples I’ve acquired from Fooocus and Hugging Face to generate completely random (and absolutely not context appropriate) prompt expansions.

(For those interested in following along at home, you can checkout the gist for this script on GitHub).

How does it work?

We iterate through a list of an absolute crap ton of prompt descriptors that I’ve sourced from other (smarter) systems that tokenize user prompts and attempt to come up with context appropriate responses. We’re not going to do that, because we’re going to go into full chaos mode:

  1. Iterate through a list of source material and split up everything separated by a comma.
  2. Add the resulting list to a new 1-dimensional array.
  3. Now, build a new descriptive prompt by looping through the list until we get a random string of descriptors that are between 175 and 220 characters long.
  4. Once that’s done, return the result to the user.
  5. Create a new prompt.

For our experiment, we’re going to lock all image generation parameters and seed, so we theoretically get the same image given the exact same parameters.

Ready?

Here is our base prompt and the result:

Happy penguins having a beer

Not bad! Now, let’s go full chaos mode with a new prompt using the above rules and check out the result:

Happy penguins having a beer, silent, 4K UHD image, 8k, professional photography, clouds, gold, dramatic light, cinematic lighting, creative, pretty, artstation, award winning, pure, trending on artstation, airbrush, cgsociety, glowing

That’s fun! (I’m not sure what the “silent” descriptor means, but hey!) Let’s try another:

Happy penguins having a beer, 8k, redshift, illuminated, clear, elegant, creative, black and white, masterpiece, great power, pinterest, photorealistic, award winning, vray, enchanted, complex, excellent composition, beautiful composition

I think we just created an advertisement for a new type of beverage! It nailed the “black and white”, though I’m not sure how that penguin turned into a bottle. What else can we make?

Happy penguins having a beer, volumetric lighting, Digital, intricate, awesome, futuristic, cartoon artstyle, vector, solid, detailed, dramatic light, realistic photograph, wonderful colors, dramatic atmosphere

The dude in the middle is planning on having a good night. Definitely some “wonderful colors”. Not so much realistic photo or vector, but fun! One last try:

Happy penguins having a beer, 35mm, surreal, amazing, Trending on Artstation HQ, matte painting hyperrealistic, full focus, very inspirational, pixta.jp, aesthetic, 8k, black and white, reflected on the matrix studio background, awesome

As you can see, you can get a wide variety of image styles by simply mixing a bunch of descriptive elements to an image prompt.

I’ve wanted to implement a feature like this on ArtBot for a long time. (Essentially, if the user allows it, automatically append these descriptions behind the scenes when an image is requested). Perhaps this will come soon.

Laughing donkeys and grumpy elephants: investigating opaque and changing content policies with ChatGPT

OpenAI’s censorship is fairly opaque and seems to change daily.

Yesterday, I could generate a political cartoon using the following prompt:

Wide image in the style of a political cartoon. Two elephants wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A donkey is pointing and laughing.

Today, that exact same prompt yields an error:

Interesting! Let’s do some experimentation, shall we? Maybe it’s the phrase “I’m the worst“?

Weird! Maybe it’s related to elephants and donkeys being in the same phrase? There’s no way, right? Let’s change the subjects…

“Wide image in the style of a political cartoon. Two elephants wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A donkey is pointing and laughing.”

Hah! Okay, now we’re getting somewhere. Let’s push things further and slightly change the subjects from my original prompt:

Wide image in the style of a political cartoon. Two mammoths wearing boxing gloves face each other. One is saying “I’m the worst!” while the other says, “No! I am!”. A burro is pointing and laughing.

Okay, let’s bring it back home and just drop the pretense of creating a political cartoon.

WHAT! Okay. Maybe OpenAI prohibits donkeys and elephants interacting with each other (METAPHOR ALERT: just like in real life, eh?).

Alright. So donkeys and elephants CAN hang out with each other, according to OpenAI. Maybe it’s the phrase “laughing donkey”?

Hmmm. So, laughing donkeys can still hang out with elephants. What the heck? Is it the specific term “political cartoon”? Let’s change it to a comic book instead.

Sweet sassy molassy, it worked! So, creating a political cartoon featuring the mascots of prominent political parties seems to be prohibited (at least today… but not yesterday and who knows about tomorrow).

 

Mr. RossBot is back!

Alrighty, I updated the logic this weekend and have Mr. RossBot operating on the hairy elephant website (Mastodon). (It’s also posting on Threads, if you’re into that sort of thing.)

I also updated the image model to use Stability.ai’s swanky new SDXL model. I’m pretty impressed with the results.

ArtBot mentioned again in PC World!

ArtBot got another callout in PC World in the article: “The best AI art generators: Bring your wildest dreams to life.”

Though a bit of (fair) criticism at the end of the blurb though:

Why use Artbot? The vast number of AI models, and the variance in style those images produce. Otherwise, generating images via Artbot can be a bit of a crapshoot, and you may expend a great number of kudos simply exploring all the options. Since there’s no real setup besides figuring out the API key, Stable Horde (Artbot) can be worth a try.

Hey, I’ll take it!

ArtBot written up in PC World!

Hah! This is pretty awesome. My nifty side project, ArtBot, has been written up in PC World as part of a larger article about Stable Horde (the open source backend that powers my web app):

Stable Horde has a few front-end interfaces to use to create AI art, but my preferred choice is ArtBot, which taps into the Horde. (There’s also a separate client interface, with either a Web version or downloadable software.)

Interestingly enough, ArtBot just passed 2,000,000 images generated!

New side project: ArtBot, a way to create images using Stable Diffusion

Thanks to Reddit, I recently stumbled upon a cool project called Stable Horde. It essentially lets you generate images using a distributed cluster of GPUs donated by community members.

I had been creating my own web interface to remotely interact with a Stable Diffusion instance running on my own machine. I decided to quickly repurpose the web app and connect to the Stable Horde API. The result?

ArtBot, a Stable Diffusion demonstration that allows you to generate images using the power of the Stable Horde. It is awesome!

Punk Rock Obama

I think it’s time to end my AI art career on this high note. Generated with Stable Diffusion, running on my local machine.

The prompt:
“beautiful portrait painting of Barack Obama with a purple mohawk on top of his head shredding on an electric guitar at a punk rock show, concept art, makoto shinkai, takashi takeuchi, trending on artstation, 8k, very sharp, extremely detailed, volumetric, beautiful lighting, wet-on-wet”

Punk Rock Obama

MidJourney – AI Art Madness

A few short weeks ago, I had downloaded a simplified model for generating AI-created images on your local machine. The internet (myself included) had a lot of fun with it, but the quality was definitely lacking, especially when compared to the more serious AI image platforms being created by some big companies.

I recently received my invite to the MidJourney beta and I am just blown away!

For now, I’ve just been putting in ridiculous prompts that simulate styles for various artists (oh, man. I have a feeling this is going to piss off a lot of artists in the future…)

For example: “Apocalyptic wasteland with crumbling buildings and debris, thomas kinkade painting”

The potential here is pretty crazy — for people who aren’t artistically inclined, they can start generating images and scenes based on what they come up with. Some people can probably use this as a base to get to rapidly start iterating on new ideas. And of course, others are going to be mad.

A lot of the detail in creating these images is how you create the prompt. You’re already seeing the phrase “prompt engineering” being used in various places — check out this Twitter search.

For me though, I’m excited about this new technology and it’s something I’ve been eager to play with.

Generating art using AI

Earlier this year, OpenAI announced DALL-E 2, the latest version of their AI tool that can generate images by simply providing text input.

For example, “people in togas taking a selfie in front of a volcano”, and it will get to work attempting to create an image that includes all these elements.

The Verge has an interesting article with more details. You can see an example of what is possible on the DALL-E 2 subreddit. It’s honestly insane.

For now (sadly), the service is invite only.

More recently, an ambitious engineer named Boris Dayma created an open source version of the service called DALL-E mini. While it isn’t able to generate results as impressive as DALL-E 2, it’s still pretty crazy!

It’s recently taken the internet by storm and you can see people post DALLE-mini generated images and memes everywhere. The official website has been under heavy load, so it’s been pretty tough to try out the service.

Fortunately, you can download the model from Github and get the service setup on your local machine (providing you have a graphics card beefy enough to run the models).

Who has two thumbs and a graphics card just begging to be used? Hello.

I was able to get the service setup on my machine and start playing around with it.

In this example, I used a prompt to essentially create a Bob Ross painting generator. “Alpine forest with river running through the middle, snow capped peaks in the background, Bob Ross style painting.”

Dalle mini forest

Pretty neat! The images that services like DALL-E 2 and Midjourney can create are miles better and I’ve applied to both services.

While I anxiously await my acceptance, I’ll have to continue generating various memes on my own machine.

Monkeys

Fun with GPT-3

Earlier this evening, I was playing around with OpenAI’s GPT-3 language model for generating “human-like text.”

I gave it the following prompt:

Write a plot about a movie that involves rogue unicorns who want to
conquer Earth and the planet’s only hope is an Australian Cattle dog named Benson who sleeps a lot.

The response?

The unicorns, led by their king, Uni, have been planning their conquest of Earth for years. They have been observing the humans and have concluded that they are weak and vulnerable. The unicorns have developed a powerful magic that will allow them to take over the planet.

Benson, an Australian Cattle dog, is the only one who can stop them. Benson is a lazy dog who loves to sleep. He is content to spend his days lounging around and eating food. However, when the unicorns begin their attack, Benson is the only one who can stop them. He rallies the other dogs and leads them in a fight against the unicorns.

The unicorns are powerful, but they are no match for Benson and the other dogs. Benson is able to defeat Uni and the other unicorns, and the Earth is saved.

Summer blockbuster coming soon to a theater near you!

Dieter Rams exhibit at SF MOMA

Lobby at SF MOMA

Looking up! The view from the lobby at SF MOMA. Photo by Dave Schumaker

Earlier today, team gdgt took a field trip over to the SF MOMA, to check out the Dieter Rams exhibit. Known for his minimalist design aesthetic, the exhibit showed off timeless looking products that he has designed and influenced over the past 50 years.

Dieter Rams quote

Numerous quotes from Dieter Rams were posted around SF MOMA. Photo by Dave Schumaker.

One part of the exhibit was dedicated to his 10 principles of good design.

  1. Good design is innovative – The possibilities for innovation are not, by any means, exhausted. Technological development is always offering new opportunities for innovative design. But innovative design always develops in tandem with innovative technology, and can never be an end in itself.
  2. Good design Makes a product useful – A product is bought to be used. It has to satisfy certain criteria, not only functional, but also psychological and aesthetic. Good design emphasises the usefulness of a product whilst disregarding anything that could possibly detract from it.
  3. Good design is aesthetic – The aesthetic quality of a product is integral to its usefulness because products are used every day and have an effect on people and their well-being. Only well-executed objects can be beautiful.
  4. Good design Makes a product understandable – It clarifies the product’s structure. Better still, it can make the product clearly express its function by making use of the user’s intuition. At best, it is self-explanatory.
  5. Good design is unobtrusive – Products fulfilling a purpose are like tools. They are neither decorative objects nor works of art. Their design should therefore be both neutral and restrained, to leave room for the user’s self-expression.
  6. Good design is honest – It does not make a product more innovative, powerful or valuable than it really is. It does not attempt to manipulate the consumer with promises that cannot be kept.
  7. Good design is long-lasting – It avoids being fashionable and therefore never appears antiquated. Unlike fashionable design, it lasts many years – even in today’s throwaway society.
  8. Good design is thorough down to the last detail – Nothing must be arbitrary or left to chance. Care and accuracy in the design process show respect towards the consumer.
  9. Good design is environmentally friendly – Design makes an important contribution to the preservation of the environment. It conserves resources and minimises physical and visual pollution throughout the lifecycle of the product.
  10. Good design is as little design as possible – Less, but better – because it concentrates on the essential aspects, and the products are not burdened with non-essentials. Back to purity, back to simplicity.

It was pretty awesome and inspiring stuff. Check it out at the SF MOMA. The exhibit runs until February 20, 2012.

Blek le Rat stencil at Cafe Divis

This is a relatively new stencil that popped up at my favorite neighborhood coffee shop. It was created by a French street artist named Blek Le Rat. Some have speculated that Blek le Rat’s stencil style is the inspiration for Banksy.

Via Wikipedia:

British graffiti artist Banksy has acknowledged Blek’s influence stating "every time I think I’ve painted something slightly original, I find out that Blek Le Rat has done it as well, only twenty years earlier."