Embedding New Styles Into Your Stable Diffusion
(Not much playing around with Stable Diffusion this past week for me due to a sorrowful family gathering. Note: Richmond, Virginia is a wonderful city and it’s a big gift to yourself to stop by the Virginia Museum for Fine Arts. It’s amazing.)
The next new thing I’ve learned using the AUTOMATIC1111 WebUIi local Stable Diffusion is embedded textual inversion prompts. That’s a bunch of odd words put together. It basically means a file you can add that lets you quickly craft a specific style to your prompt using just one keyword. It’s kind of like visual extensibility for your Stable Diffusion install.
Sometimes it is for producing a very specific subject styled in a certain way. So, it can be the subject of your prompt or a way to style your prompt that Stable Diffusion doesn’t currently do… perhaps a particular artist it was never trained on.
Here’s a before and after, with the after adding just the text dark-forest to match an embedding style file added to the system:
If you have a collection of images that you want to train and create your own textual inversion style: you can! There are existing textual inversions files created by others that you can download, though realize you’ll have to ignore a bunch of experimental ones (mainly, dudes training their own face for Stable Diffusion).
Get You Some Textual Inversions Locally
Here’s one example specific to the WebUI setup. In it, I download a textual inversion called “Midjourney-style” to see if it can Midjourney-up my Stable Diffusion art.
- First, go to sd-concepts-library (Stable Diffusion concepts library) (huggingface.co)
- Find a concept you like, or search for something like “MidJourney”
- Click on the concept to see what kind of output it creates (like sd-concepts-library/midjourney-style · Hugging Face )
- If you like it, click on the “Files” tab (like sd-concepts-library/midjourney-style at main (huggingface.co) )
- Download the .bin file (e.g., learned_embeds.bin )
- Rename that to the style you want to type into a prompt (e.g., rename learned_embeds.bin to midjourney-style.bin)
- In the home of your WebUI installation, ensure there’s an embeddings directory.
- Copy that renamed .bin file into the embeddings directory.
What I like to do at this point is find an existing txt2img that I’ve made and try the new style on it. Currently, it’s kind of easy:
- Either find your txt2img output directory or in the current WebUI click the folder icon in the txt2img tab to open that folder.
- In Stable Diffusion WebUI, switch to the PNG Info tab.
- Find a txt2img file you like and drag it into the PNG Info tab. This will decode the prompt and the settings used to make the image.
- Select “Send to txt2img”
- Switch to the txt2img tab and generate to ensure you’re setup to regenerate that image as it was.
- Now add any .bin file name that you’ve placed into your embeddings folder to see if it changes the image, e.g., if you copied midjourney-style.bin into your embeddings folder, add midjourney-style to your prompt.
- Generate.
- Did it make a noticeable difference? Eh, you might need to grab some different images and see if the embedding changes things for you or not. Or move the new style around inside of the prompt.
Note that I see in my command prompt, when I generate with a new file in the embeddings directory, that it confirms the number of textual inversion embeddings it has read in. It’s worth peeking in there to see if that’s happening for you, too, if things aren’t working quite right.
You’ll need to try the new textual inversion style in your prompt in different places to see where it can best have impact.
Now then, I had to ask myself: is this really actually doing anything or is changing the prompt just getting a different result? So I had an embedding called hewlett.bin and got a very interesting, modified Snow White from it (below). I moved that .bin file out of the embeddings directory and tried again and got a new (close to the original) result. So yes, it’s doing something. In this case, something major.
The Textual Inversion · AUTOMATIC1111/stable-diffusion-webui Wiki (github.com) wiki page has instructions for creating your own textual inversion .bin file. If you’re an artist, you might go through this to create your own style for Stable Diffusion to use. I’m looking at you, Sudi! There are other places to make this training, too.
Perhaps more important is that Wiki page has some additional textual inversion resources at the end… perhaps a real boon if you’re a, ah, anime fan. I mean… wow. Alright. Not my jam but rock on if it’s yours.
The main call-out is another view into the HuggingFace textual inversion library:
I ran into some trouble with one — I saw it was named glow-forest but the actual source files said dark-forest. I went to the repository and downloaded the .bin file, renamed it to be dark-forest.bin, put it in my embeddings directory, and tried this before and after, retaining the seed between the two:
- A pacific northwest forest with sword ferns and fir trees and a path going off into the dark of the forest, midnight, full moon, realistic, trending on artstation, highly detailed, by artgerm
- A dark-forest pacific northwest forest with sword ferns and fir trees and a path going off into the dark of the forest, midnight, full moon, realistic, trending on artstation, highly detailed, by artgerm
The before & after is the picture above at the start of this post.
My advice here is if you see something interesting off of the HuggingFace repository to go into the list of files and download the .bin and ensure that the correct style (what you’ll rename the file to) is confirmed in the token_identfier.txt. Looks like .pt files are available too and I regret that I haven’t tried those yet.
Well.
What the heck. Let’s try one.
Okay, going back to Stable Diffusion Textual Inversion Embeddings (cyberes.github.io) I downloaded the chen-1 style as a .pt file and then copied that file into my embeddings directory. It’s a style and I added it to my Snow White image. Okay, it works. Well, you know what? If you can find a .pt file it might be quicker & easier to get into your embeddings directory than those .bin files. Your mileage and your variance.
Here is a more extensive set of Snow White generations — starting with the original and then adding textual inversion styles downloaded into the embeddings folder:
- Original
- Midjourney-style
- Anime-AI-being
- Concept-art
- Agm-style
- Chen-1
- Hewlett
- Nixeu
- Pixel-toy
Now, one thought I have on this: if you find an embedded textual inversion prompt (oy) that you like and use often, others that you share your generation settings with won’t be able to reproduce it without installing the same in their embeddings directory.
Stable Diffusion & Such News of Interest
Things I’ve seen pop up during the past week that got my interest:
- Wired: The Joy and Dread of AI Image Generators Without Limits | WIRED — yes, you can create some pretty out-there stuff with Stable Diffusion. There’s an Unstable Diffusion Discord. Of course there is. I haven’t taken the leap into that yet. Feel free.
- Vice: AI Is Probably Using Your Images and It’s Not Easy to Opt Out (vice.com) — as discussed previously, the LAOIN dataset snarfs up references to images on the internet. Models have already been trained and released. How impossible is it to be removed from that dataset? Way.
- KQED: ‘The Cat Is Out of the Bag’: As DALL-E Becomes Public, the Possibilities — and Pitfalls — of AI Imagery | KQED — wring those hands and clutch those pearls.
- Fstoppers: The Coming Menace of Artificial Intelligence And How We Can Respond As Artists | Fstoppers — a cold splash of reality for artists and the path forward. “… did you become an artist because you had something to say?”
- The Atlantic: Don’t Fear the Artwork of the Future — The Atlantic — a comparison of AI art to the emergence of photography.
- Tatler: Is AI-Generated Art Here to Stay? Here’s How It Can be a Gamechanger for Designers | Tatler Asia — looking forward to the integration of AI into design and what designers still bring forth to creation.
- Washington Post: AI-generated images, like DALL-E, spark rival brands and controversy — Washington Post
- Video — we knew it was coming. Meta unveils an AI that generates video based on text prompts | MIT Technology Review / Meta’s AI video generator tool is already giving me nightmares | PC Gamer
Techie:
- Ars-technica: Better than JPEG? Researcher discovers that Stable Diffusion can compress images | Ars Technica
- DreamFusion: Text-to-3D using 2D Diffusion (dreamfusion3d.github.io) (nice!) I wish I was still on the HoloLens team to hook this up and examine the models. I guess I need to dust off the Oculus…
- I’d like to hook that up: Meta’s open source AITemplate provides a 2.4x increase of performance in Stable Diffusion. : StableDiffusion (reddit.com)
- Under the hood: The Illustrated Stable Diffusion — Jay Alammar — Visualizing machine learning one concept at a time. (jalammar.github.io) (very nice)
All the best.