Award winning photoshoot of Taylor Swift, (A sorceress that looks like Taylor Swift), Angry, screaming, fiery hair, flaming eyes, fire dress, arms on fire, room on fire, flames, sparks, embers, dark smoke, ruins, photo by Patrice Murciano and Agnes Cecile and Anna Dittmann and Bella Kotak and Carne Griffiths and Jovana Rikalo — Taylor Swift Will Burn Your AI to Zeros, by Stable Diffusion txt2img

Stable Diffusion + MidJourney + DALL-E Have Rung the AI-Art Bell

9 min readSep 19, 2022

I have been experimenting with text-to-image in Stable Diffusion for the past week, running it on my home PC. I’ve just recently started experimenting with image-to-image as well. I have thoughts.

You can’t unring that bell.

Zen Garden Bell on rocks, mountain range, dark cloudy skies, professional photo — Mystic Zen Bell, by Stable Diffusion img2img

Ugh! You just opened a can of worms.

Looking down at glowing colored worms inside a tin can, worms!, worms everywhere, hallucination, sparks, by Giuseppe Arcimboldo
Negative prompt: happy, smile — Can of Worms, by Stable Diffusion txt2img

No matter how hard the monkey tries, it can’t put that cork back in the pig.

(award winning professional painting of surprised monkey), covered in brown poo, angry, mad, scared, a large cork, globs of brown poo emojis raining down, by Dr. Seuss, high detailed, 8k, octane render, trending on artstation
Negative prompt: happy, smile — That Poor Monkey, by Stable Diffusion txt2img

You are past a point of no return. The above sayings help us reflect on what’s ahead versus obsessing on what’s behind and beyond our control. Beyond our control to put things back to where they were. What’s done is done.

The easy availability and quick proliferation of high-quality AI art has rung a bell, opened an explosive can of worms, and pulled out that wedged-in cork with a big splooshy “pop!”

Some folks are ecstatic that they can create amazing art with well-crafted prompts along with converting doodles and images into something far more impressive. People are experimenting and learning like crazy, and far more insightful and intelligent people than me are writing articles concerned about this on many fronts. The insights and attention will keep expanding as we collectively work to understand where we are.

And where we are going.

The most obvious concern is that artists and illustrators are going to lose opportunity and work based on AI doing quick jobs. Or be devalued (“Phfft. I can type some crap into MidJourney and make something way better than what you painted”). Artists with unique styles are seeing that style absorbed and — perhaps from their point of view — stolen by the AI models now pushing out multitudes of images.

It doesn’t seem fair.

You know, before I found programming and lucked into a very-very good life of technology, I wanted to be an illustrator. It was my passion and I spent hours practicing and pouring over my collected art books from Frank Frazetta, Michael Whelan, and the Brothers Hildebrandt. Even as an adult I still buy the occasional Spectrum fantasy art compendium and I certainly buy anything that Gerald Brom puts out, because Brom and his wife Laurie are awesome artists (I’m unabashedly biased there).

Now an AI model like Stable Diffusion can create images based on any artist’s style absorbed from the public internet into the training for the model.

As an illustrator wanna-be, my first reaction was very Samuel L. Jackson. But instead of snakes on a plane it was copyrighted art in an AI model. Poetically it doesn’t flow as well but the meaning is there.

Blue Robin Williams as Aladdin’s Genie, (Robin Williams), coming out of lamp, smokey, explosions, painting by Agnes Cecile and Patrice Murciano — The Genie Has Left the Lamp, by Stable Diffusion img2img

Then after a bit, I heard the resonating ring of that bell fading. I see that this genie is out of the bottle. And it’s one hell of a busy genie. A very accessible genie. There is no going back from this point. While there are gates and payments to using AI generators like DALL-E and MidJourney, Stable Diffusion is open source and the model is freely available. It’s tootin’ away on my old desktop and I’ve made over 3,000 generated images just this week — probably much more given the occasional discarded horrific output it creates.

As you may know, the Stable Diffusion model was trained on the LAION collection of tagged images from the internet. There are multiple sources consumed by LAION and then used to build out Stable Diffusion’s model. If an image was publicly crawlable / discoverable on the internet and had reasonable meta-data (either expressed or derivable), it’s probably in there.

A lot of copyrighted material is shared by fans. Some artists even share their own artwork. And LAION has seen it. You might remember ClearView AI got in trouble for vacuuming up lots of social media faces in violation of terms of use. It’s unclear if LAION has done anything similar in vacuuming up references to Flickr, Pinterest, Reddit and the like. What’s in there? You can query LAION.

Off-hand, it feels like there’s a firewall of deniability here. LAION just has references to web-crawled images, not collecting the images themselves. Stable Diffusion takes whatever is inside of LAION and trains away. You cannot remove training from models that have been released. Can you affect, in the future, what they are training on?

One additional cherry-on-the-top for Stable Diffusion: you can add new training to a released model. I didn’t know what a waifu was. Now I know a bit more given that there’s a Waifu-Diffusion model with some additional training. Meaning anyone with a quality tagged database and ability to train can add their specific image sets into a custom model.

Now while I’d say today the bell has been rung, I guess we’re all looking forward and see there are many bells out there in the future that have the potential to be rung next. What are they and can the ringing be reasonably stopped? Should it be stopped?

Some random things I can think of first reacting negatively to AI art generation…

Do not consume standard — metadata embedded / associated with content that specifically forbids AI model consumption. LAION and the like would be expected to proactively filter them out. For copies, CLIP neighbor matching would need to exclude a scrubbed copy if any originating version had expressed exclusion rights.
Compensate to consume standard — let’s make some pennies! Metadata expecting compensation if consumed in training an AI model.
Confuse your consumption — sources meant to poison AI models. While something like CLIP doesn’t have to depend on metadata, there could still be the use of confusing / confounding metadata, embedded and associated, to throw a sabot into the workings of the diffusion model.
Commissioned use standard — I don’t think this would work but I’m sure someone will try to get commissions for artists that have their names used in prompts.
Right to be excluded — like the right to be forgotten, will there be a push to be excluded, including forcing the retraining of AI models after being removed? That’s an expensive prospect.
Did an AI generate or help with this? A new checkbox. If an AI cleaned up something that seems okay. If an AI created something, then that should be revealed.
Search engine awareness — most AI generated art has (or should have) a watermark that a search engine could detect. It should be a search term to differentiate / include / exclude / downvote such content.
“Proudly human” — publishers including art (like articles and books) could go out of their way to ensure you know they employ human artists and that is seen as an ethical priority. It’s the new organic.
Learning bias — this is an explosive one to bring up. All I can say from my thousands of images is: sure are a lot of Caucasian folk showing up by default in Stable Diffusion for anything I ask for unless I’m very specific about race. “Most beautiful woman in the world, professional photo” isn’t too inclusive. Neither is “handsome man.” Complaints about bias on something like Stable Diffusion is a pile of tasty red anger meat waiting to feed Twitter.

Most beautiful woman in the world, professional photo — Montage of “Most beautiful woman in the world,” by Stable Diffusion txt2img

Most handsome man in the world, professional photo — Montage of “Most handsome man in the world,” by Stable Diffusion txt2img

And while we’re talking about images today, of course there’s more on the horizon:

Music — I’ve already seen some demos for AI music creation. Now then, while visual artists seem to be out of luck legally with AI copying their style, I respect that the music industry is waaaay more militant on their legal protections. I pity the AI that consumes, let alone produces near copies of, Taylor Swift’s music. She will burn your AI down to zeros. Not a single one will be left.
Voice — this one seems like it should be here already but perhaps there isn’t someone willing to put out the major bucks to train the model with the huge breadth of scope that Stable Diffusion has. I totally want Christopher Walken to be reading me my news clips.
Video — yep, there are demos for this too. I don’t mean deepfakes over an existing video. I mean creating the video from scratch, say in the style of a particular movie. You’d need voice and music to go along with this to make it complete.
Story writing — after Stephen King has shrugged off this mortal coil he can still be writing us horror stories. We all know that death is not going to stop Stephen King from writing anyways. Well, for a public AI model that’s compliant to publisher’s copyright I think it will be done with public domain books, so at least Ambrose Bierce can get to updating his dictionary. GPT-3 is getting there so the “bong” of the bell is imaginable.
3D AR/VR world creation — this is something that brings it all together. To me this is hot and if I wanted a tech-fundraising roadshow extravaganza I’d be out with the megaphone and a bag of holding for all the incoming cash. Being able to describe the visual + auditory + story in a character driven 3D experience you want to have (and it working well) I think is intoxicating. I want to attend a murder mystery dinner in a dark old mansion with the Scooby-Doo gang (I’ve had a crush on Daphne since I was five so I get to investigate with her — besides, Shaggy and Scooby spend way too much time running between doors in long hallways). That’s the metaverse, baby.
Recreate the moment — as we learn what AI models need, would we produce gadgets to monitor our life to use as shared data to recreate past moments? If we all end up with little AR assistants do they constantly consume surroundings and regurgitate that back up into my historical life model AI? And if I can recreate, I certainly can alter and experience it a new way. Oh wait. Black Mirror did this already. Kind of.
Hunter Killer AI — “Hello. My name is Greg Rutkowski. You stole my style. Prepare to die.” There are a lot of stories out there about AIs that go to war. Why not craft an AI to obliterate other AIs and what they’ve produced based off of consuming my work? Perhaps a son-of-a-B lawyer-AI with its DMCA take downs. Maybe it’s more a Case-AI from Neuromancer, doing what a dark-web hacker is going to do to pay the bills and give AIs chills.
Black Mirror — speaking of Black Mirror, I can only expect there will be even more cool dystopic AI amuck ideas coming from Black Mirror inspired by the anger and fear around AIs taking over creative work.

Also, laws. I’d expect the EU to be more ahead of other countries in protecting what an AI can and cannot be trained on based on fall-out from AI art generators. E.g., a citizen can say they don’t want their images or intellectual property consumed by a model, whether for mundane or nefarious purposes. So, you know, there’s probably a future market in rebel oil-rigs laden with discarded crypto-mining GPU cards grinding away at lawless AI model training.

So as of today, we are where we are. I don’t especially like the cause and implications, but I realize there’s nothing I can do to change it. Nor do I think there’s anything anyone can do to unring the bell.

Going forward, I would expect folks to use AI art to do better-than-clip-art for illustrating their stories, whether professional or passion projects. Covers should improve for magazines and books and other materials (though again, have some good thoughts for Greg Rutkowski for that future day of stepping into a bookstore and seeing book covers illustrated in his style). I’d also expect little touches like artwork hanging in the rooms of video games to improve.

With respect to enriching my life, I can tell you that in my attempts to have better prompts I have become acquainted and re-acquainted with so many artists and photographers, and their styles. I’m thankful for that.

We just traveled through an inflection point and are experiencing shockwaves of disruption. Exciting, unbalanced times are ahead in art and AI, I’d expect. And I hope. I don’t think it will be boring.

Stable Diffusion + MidJourney + DALL-E Have Rung the AI-Art Bell

Written by Eric Richards

Responses (11)