The Joy of Exploring an AI Image in Stable Diffusion
Once you have an AI image generator like Stable Diffusion figured out, you find you can generate loads of images. Loads. Thousands. Tens of thousands. And, every once and a while, an image is going to stand out. Maybe it’s the style of the image or the content. You of course want more of that! You want to explore that style or composition and see what Stable Diffusion can do. Let’s go through a couple of straight forward examples.
Note: I’m using Stable Diffusion and Automatic1111. The concepts should apply to any other environment where you have the ability to tweak an existing prompt or do image-to-image transformation.
First, Getting Those Images
If you already have some images that you love then you can skip this part. If you want to generate a whole load of images to sift through to find that diamond in the rough, let me share what I do and some things I’ve learned since writing about wildcard image generation.
For all of this article, I’m using the following Stable Diffusion SDXL model: Copax TimeLessXL — V9 — for me, this model works great because it can produce a wide range of styles (vs. some other models where you’re going to get the same look no matter what medium or artist you ask for):
On to wildcard random image generation…
Important: one big learning before I link to my previous write-ups: I have moved on from the basic Wildcard extension in Automatic1111 (thanks for all the good times) to the Dynamic Prompt extension. It’s a super useful extension. If you have it, you’ll see a Dynamic Prompt region on your main Automatic1111 txt2img page. If it’s not there and you want it:
- In Automatic1111 go to the extensions tab.
- Select “Load from:” with the default place to load extensions from.
- Under Available look for sd-dynamic-prompts (the URL is https://github.com/adieyal/sd-dynamic-prompts.git).
- Install and restart.
- Under your Automatic1111 directory structure you’ll now have extensions\sd-dynamic-prompts — fill that with the wildcard files (and directories of files) you want to use.
Dynamic Prompt is a superset of the Wildcard extension. One big advantage is that it will continuously expand the wildcards in your prompt, meaning that a wildcard can contain another wildcard, so you can create a top level wildcard that expands and expands and expands. More on that below.
Oh, one last big advice on Settings tab for the Dynamic Prompt extension: I always turn on the “Save template to metadata: Write prompt template into the PNG metadata” checkbox so that the starting template is saved in my PNG. That way, if I look at the metadata later (like with PNG Info in Automatic1111) I can see (and reuse) the template.
From Wildcard Prompt to Wildcards of Wildcards
So after having big prompts full of wildcards, I eventually decided to break down my big prompt into a structure something like this:
__prelude__ __subject__ BREAK __artStatement__ BREAK __qualityKeywords__
In that file (let’s call it meta.txt), I have additional lines moving those elements around, with and without BREAKs. Just to see what happens and to have some variety.
Each one of those (like subject.txt) are their own wildcard file structured to have variations of what I had been typing into my big long prompt.
I can now have my prompt just be:
__meta__
When it runs, it will pull a random line out of meta.txt and keep expanding those wildcards from additional files until it has its final big random prompt. Sometimes I’ll add some LoRAs onto the beginning prompt, like:
__meta__ <lora:Perfect Hands v2:1.4> perfect hands <lora:InkArtXL_1.2:0.2> ink art
One thing I did as part of the artStatement.txt file is experiment around with having just one artist or weighing heavily on composition artists I like or combining a whole bunch of random artists together at various strengths. You can create a bunch of experimental possibilities and then kick a big ole job off before you go to sleep. Wake up, and look for diamonds.
Whoo! I Love That Style! Exploring a Style.
Sometimes a picture gets kicked out that — even if you don’t like the rendering — you really like the style. I had this one recently:
That image is not a winner at all, but the style is intriguing. To see if that style is really something vs. a random one-off, first do a quick experiment like this:
- In Automatic1111, go to the PNG Info tab.
- Drag that image into the tab.
- Send to txt2image.
- Add one to the Seed value (we don’t want to generate this again).
- Increase the batch count to at least 4.
- Generate.
Remember: a key reason the above works is that important information about how the image was generated is stored in the .PNG file. If you’re not storing that information then you’re going to have to figure out how that winning image was generated some other way.
For the images generated, do they maintain the style in a way you like? If so, you’ve got a keeper. An example run for the above image:
Re-Randomizing the Prompt and Keeping the Style
Okay, the style is good but, eh, I want to change this prompt. The guns typically don’t turn out well and so I’m going to remove that… and wait a second. Knowing the smaller atoms of the wildcards that the meta wildcard prompt broke apart into, I’m going to re-randomize the final big prompt with the style parts that I like so that I can get some variety again and — hopefully — keep the style. I’m going to do things like:
- “(face looking Tired:1.4)”? Make the facial expression set by the __expression__ wildcard.
- Change up the specific female character by using one of my wildcards like __female3__.
- Likewise, change the male text back into the __male__ wildcard.
- “Olivia”? Set her name by using the __nameHer__ wildcard.
- Ah, it has “gloating” as the verb. Let’s change that just back to the __verb__ wildcard.
Yeah, and let’s send the seed back to -1. Let’s try a new run and see if we broke the style!
There’s a lot more you can now start experimenting with for this style: using the same model, bring up the X/Y/Z Plot script and vary things like:
- Sampler — try some of your favorite samplers. Don’t forget to try Restart. It’s new and interesting.
- CFG Scale — try something going from acceptable low (e.g., 5) up to say 8 (or more, but watch out for CFG burn). So something like: 5,5.5,6,6.5,7,7.5,8
- Steps — you can get radically different results for lower steps up to higher steps.
Break the Style
Do you truly understand why this style you like in this prompt works? Try to break it to see if so. Perhaps it’s the combination of artists. Change or remove an artist. Perhaps it’s the art medium (e.g., “alcohol ink”). Perhaps it’s a LoRA you added — change the strength of the LoRA or remove it altogether.
Break it to understand it better. And if you don’t find something more interesting, you’ll at least have an appreciation for what you have.
Whoo! Interesting Scene! Exploring a Composition.
Okay, one morning I wandered down to my big overnight random run to sift through the results and this one stood out:
Not the cleanest image, but the content intrigued me (I liked the lighting highlights, too). It looks like some Dungeons & Dragons adventures are deep down someplace they are not supposed to be and a large entity is coming from below to confront them. I like my tropes (more on Instagram at Untold Tales). Maybe a game of riddles will ensue next, ala Gollum and Bilbo. So, I was curious: can I start with this image and explore the space around it, tweaking various parameters, to see if I end up with something I maybe like better? Let’s try.
Some basic steps:
- Go to the PNG Info tab and drag that image in there.
- Send to txt2image.
- Enable the X/Y/Z Plot script.
- What I wanted to explore was varying the CFG and the number of steps, below and above the given image’s generation value. So I typed that in to the appropriate selections in the X/Y/Z Plot parameters:
- CFG Scale: 5.0,5.25,5.5,5.75,6.0,6.5
- Steps: 51,60,70,80,100
- Generate.
- Go off and do something else for a while.
It then results in the following grid, CFG across and steps downwards:
Hmm, okay, yes, I see one that I like better. It has a CFG of 5.75 / 100 steps and a bit different than the original. Can we explore more? First, I’m going to do a simple X/Y/Z plot with just Steps to explore different steps. And then choose between the first one of the grid that caught my eye and the series of additional steps.
Yeah, I don’t like any of those. Now I’m going to do one more thing with the one I liked, which is to scale it up for more details.
Scaling and Switching Samplers
I’m going to load the image I like into the img2img tab. One way to do this is to put it into the PNG Info tab and then send it over to img2img. This preserves the settings and prompt. Here’s what I did next:
- Kept most everything the same.
- Changed it to resize by 2 times.
- Dropped the steps down to 25.
- Changed denoising strength to be 0.20 so it doesn’t go and create something totally new.
I did that, I liked the results but I’m like, yeah, I’m digging the rough inky aspects of this, but what if I did want it smoother and more detailed. Ah, I’ve got an idea.
Given that your original sampler wasn’t “Euler A” go ahead and switch to “Euler A.” I don’t know about you, but I discover that “Euler A” usually gives smoother results, especially for resizing. Maybe that bugs you sometimes (e.g., skin looks like plastic) but let’s try it here.
Oh, that actually looks much better to me. Can I improve it a bit more?
What I’m going to try next is a double upscale.
- Set resize to 1.5 times.
- Resize the original w/ Euler A as the sampler again.
- Drag output resized image into img2img as the new source.
- Resize again at 1.5 times.
Even smoother. Here are all three, in sequence: original, first Euler A resize, and then double Euler A resize:
Nice. I’m going to share that to Threads and Instagram.
Recap
If you generate an image with a style you really like, see if it’s truly a style you can generate more images on and then deconstruct it as much as you can to create new variety of compositions using that style. Try out different samplers to see how the style holds and even different models to see if the style has something special model-to-model.
If you generate a composition you are enamored with, see about exploring the space around the image to see if you can generate a version you like even more. If you want more details you can put it through the img2img resize process. You can also switch the sampler if you want an alteration to the image, like a smoother image via “Euler A.”
There are so many variables you can experiment with or even things like looped generation in img2img. Play. Break it. Fix it. Play some more. Kick off big jobs before you go to bed or work or out for a while. Just don’t forget to triage and delete what you don’t want. Otherwise, you’ll find yourself deleting thousands and thousands of images one thankless Sunday afternoon. Cheers.