Stable Diffusion — A Few Thousand Images Later

Eric Richards
12 min readJul 30, 2023

--

I’m going to discuss things that I learned and relearned doing thousands and thousands of AI art image generations with Stable Diffusion over the course of almost half a year. Wow, it’s kind of hard to read the words I just typed.

To re-emphasize what I discussed in part one of this post:

(That has some points and resources that I’ll be glossing over here.)

I love images that look like they come from a story — maybe straight from a movie scene or a book cover illustration. When I was a kid, I spent hours pouring over Frazetta, Michael Whelan, and The Brothers Hildebrandt. I studied those paintings, many of which were illustrations to books I’d never read, and I got to imagine what the story might be based on what the art depicted.

I’ve been going through zoned-out joy triaging all of these images. Many look like pivotal scenes from a story and I just enjoy thinking what might have happened up to that point and what happens next. Nevermind that this is noise pattern being regurgitated into an image based on a bunch of mathematical models and perceived weights. Some of these images I absolutely adore. It’s been relaxing and rewarding and sometimes exciting to go through all of these images.

I hope if I share enough here, and if you have the same interests as me, you can generate a big batch of images that you might have the same joy and excitement.

A reminder: I currently put images I create on Instagram here: https://www.instagram.com/rufustheruse.art/

The Bits

Okay, with all that expressed, here’s the stuff which I learned and re-learned:

I don’t think I’ll ever generate images of people without having a facial expression specified. I found almost all the early images I generated had vacant, dull, flat faces. Once I started using my expression.txt file I got great variety. Now sometimes the scene doesn’t look like someone should be laughing, true. But a slight smile or pout or frown can make all the micro-expression difference in the world. And a maniacal laughing villain? Priceless.

Camera angle is awesome. The wildcard reference to the camera angle is one of the most important spices in creating unique images that aren’t just head on, level composition. Oblique angles and angles looking up, or down, add creative personality to the total image.

Experiencing all of those new artists in the artist-csv.txt wildcard file resulted in my new curated list to use later for more refined image generation. I already had a curated list for fantasy artists and classic artists, pretty much based on all the experiments I’ve written about before. But what other artists are there? So I did many runs that pulled in a random primary artist from the artist-csv.txt wildcard file to let me find new, powerful artists.

And when I find a new artist I have to realize that it’s how Stable Diffusion generates an image based on training for that artist when using the current loaded model and all the other context about the prompt / negative prompt. So for instance: I use Ingrid Baars to generate amazing faces for my current runs. But it generates nothing that looks like her art.

In addition to your model, the choice of artist can greatly affect your composition, along with any keywords you’re using. Due to a choice of artists, I started generating images with gradients or murky colors as the background. I didn’t want that. I wanted background backgrounds! Castles, forests, haunted libraries. Well, the artists I was using didn’t really create those kind of images. Want interesting compositions of a particular sort? Bring in those artists as part of your prompt.

Experimenting with artist mediums, materials, and technique can give surprising results. I in particular discovered something… ironic? Here I love generating all of these photorealistic images, but then when some got generated with block printing and halftone I… I was in love. Something so incredibly reduced and bold was being generated and I just went off on a tangent generating image after image that looked to be of simple lines. I would not have discovered this on my own, knowing myself.

style-nagel.pt embedding with block printing and (some) halftone

Sometimes I explicitly asked the image to be of __medium__ out of medium.txt and that might be watercolor, ink, charcoal, or photography. Then I experimented with mixed mediums: __medium__ and __medium__. Then I experimented with transitioned medium: [__medium__:__medium__:0.25] — remember that means to start with the first and, in this case, 25% of the way through generating, switch to the second. So it might generate randomly something like [color ink marker:watercolor:0.25].

And as you can guess, charcoal drawing is almost always going to get you a black-and-white image. Some models are going to ignore your art medium ask, especially if they are photo-realism based.

Throwing in a __verb__ wildcard can mix things up. I mean, I entered a whole new generation world when “kissing” started showing up. I admit, 80% of the time the faces aren’t quite right with lips melded together or such. Not romantic to me. But sometimes you get something that just earned a right to be on the cover of the latest bodice ripper (even one about elves and vampires).

Don’t generate more than you can triage. Here’s my story and it’s a cautionary tale. Okay, so you go and discover you can generate hundreds (or, yes, thousands) of images. Start it up before bed time and come back in the morning and flip through all of these amazing images. Which, maybe 10% or less are actually amazing, right? That’s a good run. 400 images and 40 are keepers. The rest: gobbly-gook hands or twisted bodies or strange crops or just plain no-no-no no-no-no. Well, I screwed up. I said: I’ll delete that other 90% later. Let’s generate another 400. And another 400. And so on. Days. Weeks. Months. Then I spent one thankless day deleting 20,000 images. Now, I had been deleting some up before that, but I had accumulated debt. And I wanted to get organized, but before I could do that: 20,000 images had to go. One by friggin’ one. Learn from me. Don’t accumulate that debt. Whether it’s pulling out the good or deleting the bad, do it right after the set has been generated. Your future self will thank you.

You get to know what a model generates by default. This is important to realize that your artist(s) don’t affect the image output of the model. Skip that artist.

Embeddings — textual inversions — are awesome. There are some that represent style (like style-nagel.pt) and some that represent a subject (like one I trained on imaging myself). Embeddings are good to use to shake-out your models, too. I discovered some models were resistant to making images of myself or other embeddings and that means there are strictly trained to produce a particular set of faces. That’s boring to me. I don’t want to be limited by such models. Delete. If you have a collection of embeddings, put them into a wildcard file and then set off a big batch pulling that embedding (one or more) and see what you get generated.

And while style-nagel.pt can be great in creating block print almost vector-like art, it can also be used in more photorealistic images:

style-nagel.pt can create well crafted realistic images, too.

My images’ aspect ratio can change things radically. So, for better or for worst, I generate aspect ratios that are compatible with Instagram. I feel like I’m wasting my time posting there sometimes, but post there I do. For me, that means three aspect ratios: portrait (768x616), square (768x768), or landscape (512x976). I got tired of the portrait ratio images I was generating looking like trading cards from Magic the Gathering… and not the good ones. It just looked like the main character standing there. It didn’t seem dramatic or story-like. I tried square ratio and that was about the same. My perception is that when I went to landscape ratio that it really opened up and I felt more of a story, more of a sense of place in the images being generated. You should experiment for yourself and see what you think.

Some examples using the same wildcard prompt run, just changing the resolution / ratio:

Portrait Ratio
Square Ratio
Landscape Ratio

…and yes, after you get past a certain resolution you suffer from twinning: multiple instances of the same face. You can go with the low-res fix to upscale a smaller image or do other upscaling magic later. I just suffer through deleting a bunch of twins in order to get a small percent that are fantastic and higher resolution.

If you’re using a tool, be sure to re-read it’s documentation from time to time. I recently re-read the Wiki / documentation for Automatic1111 and found I understood a lot more than when I started, plus learned about new things that popped in that everyone else seemed to be using but I had missed. For instance, the BREAK keyword to pad out your current concept token set and make a clean break for the next token set. If you use long big prompts like me, give it try and see if you see something different.

Something else I missed out on in Automatic1111? The UniPC sampler — I’ve come to love the UniPC sampler! I know it’s supposed to be like others but for me it generates interesting looking images rather quickly with a low number of steps. I compare this to when I use Euler A which I bump up to 150 steps. Now, I love UniPC but then, after a week of using it straight, I went back to Euler A at 150 steps and, whoo, it does look a whole lot better. Maybe not as rich in content, but it looks great. So, there’s a tradeoff here and there.

I’ve learned to be willing to kill my darlings. By that: my favorite prompt structure, keywords, artists, the whole sh-bang. The whole reason I started with wildcards was to shake things up. Maybe hang out on CivitAI and look at all the prompts people are providing in their sample images. Or read through the abundant documentation people are providing (I just read through the RPG model’s cookbook recently and I recommend it).

  • CivitAI: Civitai images.
  • Mage Space: Mage 🧙 | Explore
  • Lexica: Lexica — doesn’t look as amazing as when Stable Diffusion first hit the scene but you can see get some prompt ideas.

One More Prompt

So I bounced around reading and examining pictures and bits and pieces of their prompts and I put together one more run before getting this published. I liked the results — the people in the images looked very detailed and while not overly photorealistic still nicely done. Well, of course there are a few horrors here and there of melded bodies.

We’re all used to that. Here is the full prompt (all parameters) and then we’ll review:

Masterpiece dynamic (cinematic shot:1.6) (__male__) with (__female__ ) action pose in [(__locAbstract2__)|(__situation__)] (face looking __expression__:1.4) in [__medium__:photograph:0.5] art by ([__knownForFaces__| __fantasyArtist__]:1.4) and ( __knownArtists__ :0.75) and ( __classicArtists__:0.5), __verb__ , detailed eyes, detailed pupils, clean face BREAK <lora:epi_noiseoffset2:0.50>, __atmosphere__, __keyword__, __keyword__, __keyword__, __timeOfDay__, mythic, fable, legend, digital art illustration, [ : <lora:add_detail:0.7>:0.10] , symmetry, hyper realistic, detailed, intricate, best quality, hyper detailed, ultra realistic, sharp focus, bokeh, HQ, 8K, __photo_angle__ RAW photo shot on __camera__, film grain, masterpiece photographic art, (photo realistic:1.4), depth of field, rule of thirds

Negative prompt: statue, card, crossed eyes, standing, posing, bindi, Maang tikka, cleavage, (close-up:1.6), boring, hair on face, face marking, face jewelry, face paint, portrait, mutated, front-facing, blue eyes, Asian, anime, UnrealisticDream.pt, BadDream.pt, (easynegative.pt:1.0),(bad-hands-5.pt:1.0),((nude)),((naked)),((sexy)),((nsfw)), face paint, cartoon, animated, toy, figurine, frame, framed, perfect skin, malformed sword, (low quality, worst quality:1.3), FastNegativeEmbedding.pt, glowing breasts

Steps: 150, Sampler: Euler a, CFG scale: 4, Seed: 1009467500, Size: 976x512, Model hash: ad1a10552b, Model: aaaBest_rundiffusionFX_v10, ENSD: 31337, Version: v1.2.1

You should be able to paste that into Automatic1111’s main prompt text box, click the “Read generation parameters” button under “Generate” and have it fill in the fields. You’re not ready to generate, though.

I’d clear the seed field back to random (-1), and then be sure you read through my previous posting about setting up a wildcard run. In this case you should choose your favorite model first and then do just a batch run of say four images, so that you can tweak to your liking. The important aspects of this prompt to know about so that you can tweak and improve:

  • The male.txt and female.txt wildcards refer to lines in the files for various kinds of interesting characters. Their aspects get melded together sometimes in this prompt.
  • locAbstract2.txt refers to series of abstract environments that I generated from my SD educated ChatGPT thread.
  • Situation.txt refers to dynamic confrontational situations.
  • The [ (__locAbstract2__) | (__situation__) ] directive tells SD to alternate, ever other step, between the text from __locAbstract2__ and the text from __situation__. I did this because I like the results of both but appending them together would be too much, even for me. So I decided to alternate between the two and meld them together that way.
  • Expression.txt again refers to face expressions — I think this is one of the most important aspects you can add to your images. No more dull lifeless faces please!
  • Medium.txt refers to an art medium, like oil paint. The directive is to start in a medium and transition to photography 50% of the way through.
  • knownForFaces.txt is my list of artists who make well done faces.
  • fantasyArtist.txt is my list of fantasy / science-fiction artists who produce both good quality results and interesting compositions. I have lots of previous posts on this.
  • knownArtists.txt is my list of artists pulled out of artist-csv.txt from wildcard runs that have produced quality results, for me.
  • classicArtists.txt is my list of artists appreciated by me for producing well detailed, high quality work (not necessarily fantasy or such).
  • <lora:epi_noiseoffset2:0.50> is a LoRA downloaded off of CivitAI that I find essential for toning down the bright nature of SD images. It’s at 50% power for darkening the images. I see great improvement with this.
  • Atmosphere.txt is something new I’m trying. It has lines like “dystopian atmosphere” to see if it influences the overall environment.
  • Keyword.txt is a list of keyword modifiers (like witchcore or southern gothic).
  • timeOfDay.txt is my list of various descriptions for the time of day.
  • [ : <lora:add_detail:0.7>:0.10] — well that’s a lot to parse out and mull over! Anyway, first it’s a LoRA downloaded from CivitAI that adds detail to your image. I have it at 70% strength. Then, I have it only kick in after 10% of the steps for the image. I did this because I didn’t want anything the LoRA did to be affecting initial composition of the image.
  • photo_angle.txt is my second most important list, right behind __expression__. It sets the angle for the composition and can make all the difference in producing interesting views.
  • Camera.txt refers to a variety of camera models for taking the image.
  • As for the negative prompt, it contains the list of typical complaints about images kicked out by SD, along with a batch of embeddings (they end in the .pt extension) that you can find off of CivitAI.

Once you’ve populated all your wildcard files and tweaked the prompt to your liking, you can try it out with a few images generated from your favorite model. Once you’re happy, you can switch to the X/Y/Z Prompt script like I discussed in Part One and let’er rip. Oh, one more tip: if you want to generate a lot of pictures in a limited CFG range, note that you can bump up your batch number. For instance, if you have a CFG range like: 3.25–5.0(+0.25) you can bump up your batch from one to four, meaning that when it starts with 3.25 it will generate four images at 3.25 before moving to 3.5 to do the same, and so-on.

SDXL

Since I started the draft for this, SDXL 1.0 has been released. There are high expectations for this model and time will tell. I think my system might just be enough to slowly grunt out the occasional image, given a well-tuned Comfy UI setup.

I do hope it works out. Stability AI has burned some serious bridges, mainly with their weird-out over SD 1.5 being released. MidJourney is a far better imaging AI with serious craftsmen refining it. I hope that the community can come together, like it did with models built on SD 1.5, to create the next step with SDXL 1.0 as a base.

--

--

Eric Richards

Technorati of Leisure. Ex-software leadership Microsoft (Office, Windows, HoloLens), Intel Supercomputers, and Axon. https://www.instagram.com/rufustheruse.art