Understanding the Difference Between Stable Diffusion Models

This post may contain affiliate links. We may receive a commission on products purchased through these links. To read how this site makes money click here.

Just over the past year, eight Stable Diffusion models were released. With each new version, notable advancements and changes have been made to the models.

We’ll examine a few of the popular models (specifically 1.4, 1.5, 2.0, 2.1, and SD XL) and what sets them apart from one another, so you can determine which makes sense for your workflow.

Visit our GitHub repository (articles/sd_model_comparison) to see the raw images and settings.

Additionally, here is a full video covering many of the same topics in this post if you are a more visual learner:

Release History

Here’s a chart of the release history of Stable Diffusion models:

Version numberRelease date
1.1June 2022
1.2June 2022
1.3June 2022
1.4August 2022
1.5October 2022
2.0November 2022
2.1December 2022
XL 1.0July 2023

SD 1.4 & 1.5

SD 1.4 and 1.5 are very beginner-friendly, with the later being the more popular of the two. Here are a few key points to know:

Note: EMA stands for Exponential Moving Average, which means there is more weight on the most recent values. This is a common technique in machine learning to reduce noise and improve accuracy.

Example Images

In all of our examples, no inpainting, upscaling, or post processing was performed on these images.

1.4 Result:

Person (1.5):

photo of a young woman, standing, grassy field, yellow dress, beautiful face, (emma watson:0.4), crystal clear, highly detailed eyes, facing viewer, natural lighting, sunset, Canon EOS Mark II, 35mm, 8k, 4k, (best quality)
Negative prompt: disfigured, ugly, bad, unity, blender, nsfw, cartoon, painting, illustration, nude, anime, 3d, painting, b&w
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2515940178, Size: 512x512, Model hash: 14749efc0a, Model: sd-v1-4-full-ema, Version: v1.6.0

1.5 Result:

Person (1.5):

photo of a young woman, standing, grassy field, yellow dress, beautiful face, (emma watson:0.4), crystal clear, highly detailed eyes, facing viewer, natural lighting, sunset, Canon EOS Mark II, 35mm, 8k, 4k, (best quality)
Negative prompt: disfigured, ugly, bad, unity, blender, nsfw, cartoon, painting, illustration, nude, anime, 3d, painting, b&w
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 219195919, Size: 512x512, Model hash: cc6cb27103, Model: _v1-5-pruned-emaonly, Version: v1.6.0

Futuristic City (1.5):

futuristic city, realistic illustration, style of blade runner 2049, extremely detailed, 8k, 4k, (high quality),greg rutkowski, canon eos mark II, street view, 
Negative prompt: painting, 3d, disfigured, watermark, signature
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 951937457, Size: 512x512, Model hash: cc6cb27103, Model: _v1-5-pruned-emaonly, Version: v1.6.0

Landscape (1.5):

mountain landscape, photo, misty fog, atmospheric haze, extremely detailed, 8k, 4k, (high quality),
Negative prompt: painting, 3d, disfigured, watermark, signature, illustration
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3330699356, Size: 512x512, Model hash: cc6cb27103, Model: _v1-5-pruned-emaonly, Version: v1.6.0
  • Strengths of the 1.x models: These models work well on consumer hardware and are easy to run and generate an image fairly fast. They are less censored than the later models, so you may be able to use artists’ names, celebrities, etc., to generate images.
  • Weaknesses of the 1.x models: These models are not as good at interpreting prompts as the later models. Therefore, prompts, both negative and positive, may be significantly longer. Disfigured limbs, hands, etc., are common. Of course, this can be fixed with inpainting or specialized embeddings (i.e., badhandv4, EasyNegative, etc.). While the resolution is lower, it can be offset by upscaling the image.

Note: We didn’t evaluate 1.1 – 1.3.

SD 2.0 & 2.1

Released in late 2022, Stable Diffusion 2.0 and 2.1 brought some notable advancements in quality, resolution, and prompt interpretation.

Note: The 2.x models are the only ones to date that require a separate config file to be included alongside the model.

Example Images

2.0 Result:

Person:

young woman in a yellow dress standing in a grassy field, facing viewer, sunset
Negative prompt: disfigured, unity, blender, cartoon, painting, illustration, anime, 3d, painting, b&w, 
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2665078546, Size: 768x768, Model hash: bfcaf07557, Model: sd_2_0_768-v-ema, Version: v1.6.0

2.1 Result:

Person:

young woman in a yellow dress standing in a grassy field, facing viewer, sunset
Negative prompt: disfigured, unity, blender, cartoon, painting, illustration, anime, 3d, painting, b&w, 
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 989936387, Size: 768x768, Model hash: ad2a33c361, Model: sd_v2-1_768-ema-pruned, Version: v1.6.0

Futuristic City:

futuristic city, realistic illustration, style of blade runner 2049, extremely detailed
Negative prompt: painting, 3d
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2654400452, Size: 768x768, Model hash: ad2a33c361, Model: v2-1_768-ema-pruned, Version: v1.6.0

Landscape:

mountain landscape, photo, misty fog, atmospheric haze,
Negative prompt: painting, 3d, watermark, signature, illustration, b&w
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3340096348, Size: 768x768, Model hash: ad2a33c361, Model: v2-1_768-ema-pruned, Version: v1.6.0
  • Strengths of the 2.x models: These models don’t need as long of a prompt and can develop beautiful results. I noticed when generating a portrait, the longer description of ‘young woman in a yellow dress standing in a grassy field’ provided stunningly consistent results. The depth and colors appear to be richer than the 1.x models. You are more likely to get your desired image in fewer attempts.
  • Weaknesses of the 2.x models: 2.0 and 2.1 are drastically different from one another. Even Stability AI states this. 2.0 was fairly aggressive in excluding NSFW images, which resulted in many images being excluded that may not have initially been flagged. This results in poor outputs that include people – some of our testing appears to support this. However, if you want to generate architectural or landscape outputs, then 2.0 may be better suited. 2.1 is a more “censored” model in that celebrities and art styles from current artists aren’t as easy to generate.

SD XL 1.0

The latest release, SD XL, is the most recent release from Stability AI. This model offers many advantages over previous models while also requiring significant resources to run locally.

  • Resolution requirements: 1024×1024
  • Parameters: 3.5 billion parameters as Joe Penna from Stability AI shared in this TechCrunch interview.
  • PromptsPer the repo, SD XL uses OpenCLIP-ViT/G and CLIP-ViT/L – an improvement over the previous models. This allows for better inference on prompts while making it easier for people accustomed to the way 1.5 handled prompts to transition to the newer model.

Stability AI had a good example of how prompts are interpreted:

Furthermore, SDXL can understand the differences between concepts like “The Red Square” (a famous place) vs a “red square” (a shape).

  • XL 1.0 Model CardThe model card can be found on HuggingFace. Unlike 2.x, SD XL does not require a separate .yaml file. The size of the weights is notably larger at nearly 7 GB (previous models were between 4-5 GB).

To run SD XL, you’ll need a dedicated GPU. Our personal test of attempting to run SD XL on an Apple M1 Pro chip with 16 GB of RAM resulted in a RuntimeError due to insufficient memory.

Options for running SD XL include:

  • Clipdrop (by Stability AI): This requires a monthly subscription at $9/mo.
  • Cloud-Based Services: RunDiffusion, ThinkDiffusion
  • Rental GPU Services: Vast.ai, RunPod, TensorDock

Example Images

Person:

a young woman standing grassy field, yellow dress, facing viewer, natural lighting, sunset
Negative prompt: long limbs, long arms, disfigured, bad anatomy, painting, illustration
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1204672155, Size: 1024x1024, Model hash: e6bb9ea85b, Model: sdXL_v10VAEFix, Version: v1.6.0

Futuristic City:

futuristic city, realistic illustration, style of blade runner 2049
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1427982414, Size: 1024x1024, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, Version: v1.6.0

Landscape:

mountain landscape, photo, misty fog, atmospheric haze,
Negative prompt: painting, 3d, watermark, signature, illustration, b&w
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3990202654, Size: 1024x1024, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
  • Strengths of SD XL: The results speak for themselves in this model. The prompts are significantly shorter and the images have a wider range of colors, depth, and composition. Double the resolution of the 1.x models results in a massive improvement in quality. People have been able to generate images of celebrities, art styles, and more. The dependency on two CLIP models allows for some backwards compatibility with 1.5 prompts.
  • Weaknesses of SD XL: This model is resource-intensive and requires a GPU and plenty of RAM.

Other Models Beyond the Stable Diffusion Family

It’s worth noting that these are all base models that we are evaluating. However, many other models have been fine-tuned. For example, ReV AnimatedepiCPhotoGasm, and DreamShaper are popular models on Civitai that depend on 1.5 as the base model – the results are stunning.

In the past few months, fine-tuned XL models are quickly becoming available. AlbedoBase XL and ZavyChromaXL are great examples of this.