The X/Y Plot to Eliminate Some Stable Diffusion AI Art Models
Summary: in this post, I discuss the X/Y Plot script for a practical task: I want to analyze the 50+ models I have in my Stable Diffusion install to find any treasures I’ve missed and to bury models that I don’t like. I also study the impact of steps and CFG with the same script. And then I admit to an oversight in my analysis, and discover an extension to address that oversight. What did I learn?
The other day, I was in Automatic1111 and decided to switch to another model to try out my current prompt. When I clicked on the model drop down, my first reaction was: whoa, that’s way too many models! I realized I had been downloading some models that maybe I tried once or twice but then forgot about. Did I not like them? Are there hidden gems in here?
I wanted to get organized. My goal:
- Move models I didn’t like for my prompts into a deprioritized subfolder.
- Move models that are way too big for me to load into my graphics card (but okay for mixing new models) into their own “Big” (aka, don’t try to load) subfolder.
- Move models that I created with a Google Collab Dreambooth process into their own subfolder.
- Move 2.1 models into their own subfolder since they are pretty useless to me.
The last three are easy to do. Done. I named the folders zBig, zDreambooth, and z2–1 (I prefixed them with a ‘z’ so that they’ll be pushed to the end of the model drop down list).
But what about finding the models I didn’t like as much for my style of prompting of illustrated-story images?
I stared into my coffee, thinking of the Python script I’d need to write to load specific models and do my best to adapt the A1111 tokenizing code for my very long prompts. I could write a file that had the model paths, ensure a safetensor could be loaded, blah blah blah… oh, Lord of the Lazy, isn’t there an easier way?
Yes.
And it’s been built into A1111 for a long time. People have been using it for experiments since the very early days (end of summer 2022) of playing around with Stable Diffusion via A1111. It’s the X/Y Plot script, and it allows you to:
- Experiment with two changing values.
- Confirm what you think is right.
- Challenge you to learn something new by trying new variations.
Here’s what I wanted to do:
- Find some example images to experiment with across the different models. Lock in that prompt and seed.
- Load up every model except for those I had already moved into a subfolder.
- Continue using Euler a sampler, but try out various samplers I see people using to see if I learn something new.
- Later, try out different number of steps and CFG values to see if I need to move out of my comfort zone.
- …
- Profit!
Here are the images I decided to experiment with:
To do my first big experiment (trimming down the models) I chose the first two images to do the following process:
- Send the image to PNG Info and send that to txt2img.
- In txt2img do the following:
- Scroll down to Script and choose X/Y plot
- X type: select Sampler.
- Y type: select Checkpoint name.
- Next to X values and Y values an auto-fill button appears. Click that button and the text fields are auto-filled in with what your environment has.
- I edited the Sampler to be: Euler a, DPM++ 2S a Karras, DPM++ 2M Karras, DPM++ SDE Karras
- I edited the Checkpoint name to exclude all those models I moved into a subdirectory already (well, I tried, more on that later).
- Everything else looked good..
- Generate.
- Walk away. This is going to take hours. Over 50 models to load and generate a high-step count image for four samplers.
If you watch your command line window, you’ll see that A1111 will drive loading each model (checkpoint) first and then generate an image with each sampler. All generated images go into the standard txt2img-images output directory.
A special output — the final result for analysis — goes to the txt2img-grids directory. Here’s what the top of such an output looks like for my first image:
I’m only showing the top because the final image itself is 2736 pixels wide and 35,994 pixels tall. It’s 138MB. I have a lot of models.
I did this again for the second image. The top of the result for that:
What did I learn?
Note: I’m going to refer to a number of models here by name vs. providing a direct link. If you’re interested, you can find the models by going to either:
In general, I prefer the .safetensor to download for A1111 — ensure that any model you’re considering has been scanned for safety.
First:
- I accidently kept in the 2.1 models in these X/Y Plot runs. They are still awful, especially in comparison to what the community has been doing building on 1.5.
Looking through it:
- In general, Euler a and DPM++ 2S a Karras (ancestral models) generate similar but not exactly the same images.
- DPM++ 2M Karras generates distorted images here. Not later, but here.
- DPM++ SDE Karras generates the occasional very good image.
- Euler a, in general, does a lot better job for what I like. I’m sticking with Euler a by default.
- I ended up with lots of face schmutz. I’m going to have to find some prompts / negatives to get clear faces.
- Early on, I learned that, for my purposes, DDIM == DPM++ 2M Karras so I removed DDIM from my first run to save time.
My first impression of models to move into a subdirectory as a downgrade for what I like (and note this isn’t criticizing the models, just the results of my use of the model):
- analogDream3D_10
- Anything-Vanything — not my jam.
- cinematicDiffusion — it does best widescreen. It has made some incredible realistic horror shots for me before, if you play nice with it.
- Dreamlike Photo Real models
- Eldreth Stolen Dreams
- FlexibleDiffusion — kind of went crazy
- hAS3Dkx10B — maybe if I could get all the colors off the face?
- Hmm, hasdx_hasedsdx and hasdx_realisticHASDX are almost the same. Choose one. I’ll go w/ realistic.
- I love InkPunk but wowza some odd things happened here.
- Original 1.4 model can’t compare.
- OpenJourney-V2 — the original midjourney model does better for me.
- Portrait model — different but not super interesting for me.
- PulpArtDiffusion — this is not your space. Wow. That’s just some crazy right there. If I’m looking for crazy, I have a go-to now.
- Samdoesartsultmerge — you’ve done great things, just not here.
- samDoesArtV3 — ditto.
- tofuMix — very washed out. Maybe I’m missing a trigger keyword.
- Unstableinkdream_v5 / unstablephotorealv.5 not great, though unstableinkdream_v5Photoreal is okay.
- V1.5 — thanks for being a base, your job is done.
- V2.1 models — whoops, how did I include you and how did you even work? Ah, in A1111 only embeddings that match the model are loaded, so even though I referred to invalid embeddings they were just ignored.
- vinterdoisDiffusionV0_v01 — not great in this space.
- WLOP model — not your best space.
Other insights:
- NovelInkPunkF222 — you look so good but you seem really attached to being naughty. Good results, just… naughty. For why are my negative prompts with all of their parentheses not working?
The tops?
- I got some wow out of ComicsBlend.
- Deliberate is new and I’m still getting used to it. Not bad. I’m liking it more and more.
- Eldereth SOG and Eldereth Lucid are nice.
- JoMadDiffusion is major influencer of SyncWaveInkPunk_V3, it seems.
- oosayamUnstableSamin made some interesting images with great contrast.
- Protogen X34 and Protogen Nova made very similar images — I like X34 better.
- SeekArtMega — quality really shows here. It’s one of my top three go-to models.
- SynthwavePunk_v3Alpha — I love this model. I don’t always get the best out of it, but when I do it shines for illustration.
So for the next steps of experimentation, this is my trimmed list of checkpoint models:
comicsBlend_comicsBlendV1.ckpt [fec96ebd64], deliberate_v1.safetensors [9f1bfee7a0], elldrethSOg4060Mix_v10.ckpt [707ee16b5b], novelInkpunkF222_v1.ckpt [da864e82d4], oosayamUnstableSamIn_1.safetensors [e05992510c], protogenX34OfficialR_1.safetensors [44f90a0972], seekArtMega_v1.ckpt [3e777936f8], synthwavePunk_v3Alpha.safetensors [ee2ab6d872]
Three Images and X/Y Plot
Next I’m going to share three full images. They are the result of going through the above long steps with the reduced set of models. If you’re curious, you can look at each image to see how the model did and what difference each sampler makes. For my above selected images, these are for the first, third, and fourth images.
You should see the reference image generated within the X/Y Plot.
(Yes, those modesty bars were added by me. That darn F222.)
So Many Experiments — What Next?
For the X/Y Plot you can choose a huge variety of things to experiment with. My next goal was to question my decisions around my number of steps and my CFG selection. I usually go with a low CFG, like 7 or lower, to let the system riff and improvise. I also go with a huge number of steps.
I’ve read that some sampler max out after a certain number of steps and you’re just wasting your time going for a higher value. Well, it’s something I read on the internet. How about I confirm it for myself?
First up, CFG. For the 4th selected image, let’s have the models go through CFG values of 3, 7, 15, and 30. To do this, I leave the Y axis alone but change the X axis to CFG Scale and type in the comma separated numbers 3,7,15,30.
Result:
My insights:
- 3 usually produces mushy muck, but a couple of the images were interesting.
- 7 is what I usually go with and it’s okay.
- 15 made some interesting images.
- 30 looks over-done and burned. Not going with 30 anytime soon for my prompts.
So honestly, I feel good about sticking with 7. Now then, heck, you could do the full range 1–30 if you wanted to. I’m planning to go back to try the neighborhood from 7 to 15 in the future. But for now, I’m sticking with 7 for my prompts.
Next, how about steps vs. samplers? For this, I took my sampler list and pasted it into the Y axis, choosing Sampler as the type. For X, I chose Steps and entered 20,40,80,120,150. Results:
Here I did learn that some samplers do indeed max out early. Not Euler a. But all the Karras samplers maxed out to some degree at 40 or 80. There are subtle differences at higher values (very subtle for DPM++ 2M Karras). If you’re using those samplers know that you don’t have to go to crazy high steps (like Euler a).
More Information:
- A bit more information on X/Y Plot is in the A1111 wiki: Features · AUTOMATIC1111/stable-diffusion-webui Wiki · GitHub — most interesting there is, for numeric values, how you can do ranges and skip increments / decrements. Also, there’s a discussion about search/replace if, say, you wanted the X axis to be an artist’s name.
Addendum — Trigger Warning
Well, you know, this was all about learning about the X/Y Plot script and how it could be used for things like model comparison. But I felt bad about one thing.
Model keyword triggers.
One thing that is perhaps unfair in my comparison is that some of the models require trigger keywords. Sometimes they are optional (as a result of merges) and sometimes they are strongly suggested. There’s no built-in way to create the X/Y plot I wanted and to have the appropriate model keyword be automagically applied.
Lord I’m lazy, is there an easy way to do this?
Again: Yes.
- As I wrap this post up, I do see there’s an extension to do this for you. I found it off of a feature request of A1111 to add model keyword triggers.
- If you’re interested, you can view it here: GitHub — mix1009/model-keyword: Automatic1111 WEBUI extension to autofill keyword for custom stable diffusion models..
- Note that to install that, I had to give the actual GitHub .git clone path to A1111, not what was provided in the README: https://github.com/mix1009/model-keyword.git — YMMV.
- Plus, in that extensions directory I had to create a blank custom-mappings.txt file so that it would run.
And so as an act of atonement I re-ran all the models again with the first image. Will my mind be changed for the models that had model-specific keywords added?
What got better?
- Analog diffusion — looks more natural, less noise.
- FantasyStyleV1 — better, but not enough to be a keeper.
- InkPunkDiffusion — better but not impressive like its usual self.
- samDoesArtV3_v3 — I’ll say better. First time it was half-landscapes. Now all landscapes but they look way better. Shrug.
What got worse?
- Arcane diffusion — well, looks more extreme comic-like, which I guess should be expected. Not exciting.
- ComicsBlendV1 — way worse. Horrible. All those trigger words helps make some ugly crap for my prompt.
- JoMadDiffusion — way worse with the keyword trigger for my prompt.
- Mdjrny-V4 — way worse. Noisy and not related to the prompt at all.
- oosayamUnstableSamIn_1 — noisy and chaotic scene, way worse than without the triggers.
- synthwavePunk_v2 — looks like pink-Tron. Perhaps true to the intention of the model, but the subject matter and the rendering are way not to my liking compared to the un-triggered model.
- openjourney-v2 — pretty chaotic results, much like Mdjrny-V4.
- portrait+1.0 — different and a wee bit worse.
- vintedoisDiffusionV0_v01 — at first just more saturated, but then some came out just bad in comparison to the first run.
Unchanged or Undecided:
- Deliberate — just different.
- Dreamlike Diffusion — very similar but I think I like the non-trigger one better.
- hAS3Dkx10B_3Dkx10B — less face schmutz but not better.
- pulpArtDiffusion_v1 — still bonkers for my prompt.
- Samdoartsultmerge — different but no better.
Summary: at least for the models I have that had their keywords added to the prompt, I didn’t find any better results. Whew. That guilt I was carrying is assuaged.
Going Forward
Whenever I download some new promising models, I’ll probably do this X/Y Plot exercise on a small set of the new models just to see how they do. It’s a good tool for my toolbox. I suggest you give it a try, if only to experiment with a favorite image and see if changes in the steps and/or CFG might get even better results. Cheers.