Just over the past year, eight Stable Diffusion models were released. With each new version, notable advancements and changes have been made to the models.
We’ll examine a few of the popular models (specifically 1.4, 1.5, 2.0, 2.1, and SD XL) and what sets them apart from one another, so you can determine which makes sense for your workflow.
Visit our GitHub repository (articles/sd_model_comparison) to see the raw images and settings.
Additionally, here is a full video covering many of the same topics in this post if you are a more visual learner:
Table of Contents
Release History
Here’s a chart of the release history of Stable Diffusion models:
Version number | Release date |
---|---|
1.1 | June 2022 |
1.2 | June 2022 |
1.3 | June 2022 |
1.4 | August 2022 |
1.5 | October 2022 |
2.0 | November 2022 |
2.1 | December 2022 |
XL 1.0 | July 2023 |
SD 1.4 & 1.5
SD 1.4 and 1.5 are very beginner-friendly, with the later being the more popular of the two. Here are a few key points to know:
- Resolution requirements: 512×512
- Parameters: 860 million parameters
- Prompts: These models depend on OpenAI’s CLIP ViT-L/14, which was released in January 2021. Here is the original blog post by OpenAI.
- 1.4 Model Card: You can find it here on Huggingface to download the checkpoint. According to Tanishq Mathew Abraham, PhD, a Research Director with Stability AI, it is recommended to use the EMA checkpoint (sd-v1-4-full-ema.ckpt) for image generation.
- 1.5 Model Card: This version was released via Runway, a partner of Stability AI, like 1.4, the checkpoint is available on Huggingface. For generation, go with v1-5-pruned-emaonly.ckpt as it is suitable for simple generation and uses less RAM. v1-5-pruned.ckpt is for fine-tuning the model further.
- The models were trained on LAION 5B dataset, which gives them terrific knowledge to generate a all sorts of images.
Note: EMA stands for Exponential Moving Average, which means there is more weight on the most recent values. This is a common technique in machine learning to reduce noise and improve accuracy.
Example Images
In all of our examples, no inpainting, upscaling, or post processing was performed on these images.
1.4 Result:
Person (1.5):

photo of a young woman, standing, grassy field, yellow dress, beautiful face, (emma watson:0.4), crystal clear, highly detailed eyes, facing viewer, natural lighting, sunset, Canon EOS Mark II, 35mm, 8k, 4k, (best quality)
Negative prompt: disfigured, ugly, bad, unity, blender, nsfw, cartoon, painting, illustration, nude, anime, 3d, painting, b&w
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2515940178, Size: 512x512, Model hash: 14749efc0a, Model: sd-v1-4-full-ema, Version: v1.6.0
1.5 Result:
Person (1.5):

photo of a young woman, standing, grassy field, yellow dress, beautiful face, (emma watson:0.4), crystal clear, highly detailed eyes, facing viewer, natural lighting, sunset, Canon EOS Mark II, 35mm, 8k, 4k, (best quality)
Negative prompt: disfigured, ugly, bad, unity, blender, nsfw, cartoon, painting, illustration, nude, anime, 3d, painting, b&w
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 219195919, Size: 512x512, Model hash: cc6cb27103, Model: _v1-5-pruned-emaonly, Version: v1.6.0
Futuristic City (1.5):

futuristic city, realistic illustration, style of blade runner 2049, extremely detailed, 8k, 4k, (high quality),greg rutkowski, canon eos mark II, street view,
Negative prompt: painting, 3d, disfigured, watermark, signature
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 951937457, Size: 512x512, Model hash: cc6cb27103, Model: _v1-5-pruned-emaonly, Version: v1.6.0
Landscape (1.5):

mountain landscape, photo, misty fog, atmospheric haze, extremely detailed, 8k, 4k, (high quality),
Negative prompt: painting, 3d, disfigured, watermark, signature, illustration
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3330699356, Size: 512x512, Model hash: cc6cb27103, Model: _v1-5-pruned-emaonly, Version: v1.6.0
- Strengths of the 1.x models: These models work well on consumer hardware and are easy to run and generate an image fairly fast. They are less censored than the later models, so you may be able to use artists’ names, celebrities, etc., to generate images.
- Weaknesses of the 1.x models: These models are not as good at interpreting prompts as the later models. Therefore, prompts, both negative and positive, may be significantly longer. Disfigured limbs, hands, etc., are common. Of course, this can be fixed with inpainting or specialized embeddings (i.e., badhandv4, EasyNegative, etc.). While the resolution is lower, it can be offset by upscaling the image.
Note: We didn’t evaluate 1.1 – 1.3.
SD 2.0 & 2.1
Released in late 2022, Stable Diffusion 2.0 and 2.1 brought some notable advancements in quality, resolution, and prompt interpretation.
- Resolution requirements: 768×768
- Parameters: Stability AI states that 2.0 had the same number of parameters in the U-Net as 1.5.
- Prompts: These models now use LAION’s OpenCLIP-ViT/H for prompt interpretation, resulting in shorter prompts that can be more expressive when compared to 1.x. More effort should be made in the negative prompt than in the creation prompt.
- 2.0 Model Card: The original release of 2.0 can be found on HuggingFace (768-v-ema).
- 2.1 Model Card: 2.1 is also available on HuggingFace. However, Stability AI includes multiple checkpoint files including v2-1_768-ema-pruned and v2-1_768-nonema-pruned (both in a .ckpt and .safetensors format).
- YAML Config Required: When using the new 2.x models, you must include a config file with the same name as the checkpoint file. For example, if you are using v2-1_768-ema-pruned.ckpt, you must also include v2-1_768-ema-pruned.yaml in the same folder (stable-diffusion-webui/models/Stable-diffusion). The .yaml file can be found in the Stability AI GitHub repo.
Note: The 2.x models are the only ones to date that require a separate config file to be included alongside the model.
Example Images
2.0 Result:
Person:

young woman in a yellow dress standing in a grassy field, facing viewer, sunset
Negative prompt: disfigured, unity, blender, cartoon, painting, illustration, anime, 3d, painting, b&w,
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2665078546, Size: 768x768, Model hash: bfcaf07557, Model: sd_2_0_768-v-ema, Version: v1.6.0
2.1 Result:
Person:

young woman in a yellow dress standing in a grassy field, facing viewer, sunset
Negative prompt: disfigured, unity, blender, cartoon, painting, illustration, anime, 3d, painting, b&w,
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 989936387, Size: 768x768, Model hash: ad2a33c361, Model: sd_v2-1_768-ema-pruned, Version: v1.6.0
Futuristic City:

futuristic city, realistic illustration, style of blade runner 2049, extremely detailed
Negative prompt: painting, 3d
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2654400452, Size: 768x768, Model hash: ad2a33c361, Model: v2-1_768-ema-pruned, Version: v1.6.0
Landscape:

mountain landscape, photo, misty fog, atmospheric haze,
Negative prompt: painting, 3d, watermark, signature, illustration, b&w
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3340096348, Size: 768x768, Model hash: ad2a33c361, Model: v2-1_768-ema-pruned, Version: v1.6.0
- Strengths of the 2.x models: These models don’t need as long of a prompt and can develop beautiful results. I noticed when generating a portrait, the longer description of ‘young woman in a yellow dress standing in a grassy field’ provided stunningly consistent results. The depth and colors appear to be richer than the 1.x models. You are more likely to get your desired image in fewer attempts.
- Weaknesses of the 2.x models: 2.0 and 2.1 are drastically different from one another. Even Stability AI states this. 2.0 was fairly aggressive in excluding NSFW images, which resulted in many images being excluded that may not have initially been flagged. This results in poor outputs that include people – some of our testing appears to support this. However, if you want to generate architectural or landscape outputs, then 2.0 may be better suited. 2.1 is a more “censored” model in that celebrities and art styles from current artists aren’t as easy to generate.
SD XL 1.0
The latest release, SD XL, is the most recent release from Stability AI. This model offers many advantages over previous models while also requiring significant resources to run locally.
- Resolution requirements: 1024×1024
- Parameters: 3.5 billion parameters as Joe Penna from Stability AI shared in this TechCrunch interview.
- Prompts: Per the repo, SD XL uses OpenCLIP-ViT/G and CLIP-ViT/L – an improvement over the previous models. This allows for better inference on prompts while making it easier for people accustomed to the way 1.5 handled prompts to transition to the newer model.
Stability AI had a good example of how prompts are interpreted:
Furthermore, SDXL can understand the differences between concepts like “The Red Square” (a famous place) vs a “red square” (a shape).
- XL 1.0 Model Card: The model card can be found on HuggingFace. Unlike 2.x, SD XL does not require a separate .yaml file. The size of the weights is notably larger at nearly 7 GB (previous models were between 4-5 GB).
To run SD XL, you’ll need a dedicated GPU. Our personal test of attempting to run SD XL on an Apple M1 Pro chip with 16 GB of RAM resulted in a RuntimeError due to insufficient memory.
Options for running SD XL include:
- Clipdrop (by Stability AI): This requires a monthly subscription at $9/mo.
- Cloud-Based Services: RunDiffusion, ThinkDiffusion
- Rental GPU Services: Vast.ai, RunPod, TensorDock
Example Images
Person:

a young woman standing grassy field, yellow dress, facing viewer, natural lighting, sunset
Negative prompt: long limbs, long arms, disfigured, bad anatomy, painting, illustration
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1204672155, Size: 1024x1024, Model hash: e6bb9ea85b, Model: sdXL_v10VAEFix, Version: v1.6.0
Futuristic City:

futuristic city, realistic illustration, style of blade runner 2049
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1427982414, Size: 1024x1024, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, Version: v1.6.0
Landscape:

mountain landscape, photo, misty fog, atmospheric haze,
Negative prompt: painting, 3d, watermark, signature, illustration, b&w
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3990202654, Size: 1024x1024, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
- Strengths of SD XL: The results speak for themselves in this model. The prompts are significantly shorter and the images have a wider range of colors, depth, and composition. Double the resolution of the 1.x models results in a massive improvement in quality. People have been able to generate images of celebrities, art styles, and more. The dependency on two CLIP models allows for some backwards compatibility with 1.5 prompts.
- Weaknesses of SD XL: This model is resource-intensive and requires a GPU and plenty of RAM.
Other Models Beyond the Stable Diffusion Family
It’s worth noting that these are all base models that we are evaluating. However, many other models have been fine-tuned. For example, ReV Animated, epiCPhotoGasm, and DreamShaper are popular models on Civitai that depend on 1.5 as the base model – the results are stunning.
In the past few months, fine-tuned XL models are quickly becoming available. AlbedoBase XL and ZavyChromaXL are great examples of this.