CS180: Project 5 - Fun with Diffusion Models!

Project Overview

In this project, we explore the capabilities of diffusion models through various implementations including sampling loops, inpainting, visual anagrams, and hybrid image generation. Using the DeepFloyd IF model, we demonstrate advanced image manipulation techniques and creative applications.

Part 0: Setup and Initial Generation

Using DeepFloyd IF model with different prompts and inference steps.

Random seed used: 180. Comparing generation quality between 20 and 30 inference steps.

Man Wearing Hat - Inference Step Comparison

Prompt: "a man wearing a hat" (20 inference steps)

Same prompt with 30 inference steps - Notice improved detail quality

Snowy Mountain Village - Inference Step Comparison

Prompt: "an oil painting of a snowy mountain village" (20 inference steps)

Rocket Ship - Inference Step Comparison

Prompt: "a rocket ship" (20 inference steps)

Part 1.1: Forward Process

Implementation of the forward process adding noise to images at different timesteps.

Part 1.2: Classical Denoising

Using Gaussian blur filtering for denoising at different noise levels.

Part 1.3: One-Step Denoising

Single-step denoising using UNet predictions at different noise levels.

Part 1.4: Iterative Denoising

Progressive denoising showing all steps and comparison with other methods.

Denoising Steps

256x256 Resolution Comparison

Part 1.5: Diffusion Model Sampling

Generating images from scratch using DeepFloyd model.

Part 1.6: Classifier Free Guidance

Image generation with CFG scale γ=7

Part 1.7: Image-to-image Translation

Using the SDEdit algorithm to project images back onto the natural image manifold with different noise levels. Using prompt "a high quality photo" for base projections.

Test Image (Campanile)

Web Image (Dog)

Web Image (Man)

Part 1.7.1: Editing Hand-Drawn and Web Images

Projecting non-realistic images onto the natural image manifold using different noise levels.

Web Image (Optimus Prime)

Hand-Drawn Image 1 (Smiley Face)

Hand-Drawn Image 2 (Optimus Prime)

Part 1.7.2: Inpainting

Using RePaint algorithm to fill in masked regions of images while preserving surrounding context.

Test Image Inpainting (Campanile)

Golden Gate Bridge Inpainting

Cow Inpainting

Part 1.7.3: Text-Conditional Image-to-image Translation

Using text prompts to guide the image projection process.

Campanile to Rocket Ship

Prompt: "a rocket ship"

Woman to Man

Prompt: "a photo of a man"

Forest to Snowy Mountain Village

Prompt: "an oil painting of a snowy mountain village"

Part 1.8: Visual Anagrams

Creating optical illusions that change appearance when flipped upside down using paired prompts.

Oil Painting Old Man / Campfire

Normal Orientation ("an oil painting of an old man")

Flipped ("an oil painting of people around a campfire")

Lithograph Waterfall / Skull

Normal Orientation ("a lithograph of waterfalls")

Mountain Village / Amalfi Coast

Normal Orientation ("an oil painting of a snowy mountain village")

Part 1.10: Hybrid Images

Creating images that appear different at varying distances using frequency separation.

Skull and Waterfall

Low Frequency: "a lithograph of a skull" (visible from far)
High Frequency: "a lithograph of waterfalls" (visible up close)

Mountain Village and Amalfi Coast

Low Frequency: "an oil painting of a snowy mountain village"
High Frequency: "a photo of the amalfi cost"

Rocket Ship and Amalfi Coast

Low Frequency: "a rocket ship"
High Frequency: "a photo of the amalfi coast"

Conclusion

Through this project, we've explored various applications of diffusion models including:

Basic image generation with different inference steps
Iterative denoising processes
Classifier-free guidance for improved quality
Image-to-image translation with different noise levels
Inpainting using the RePaint algorithm
Creation of visual anagrams and hybrid images

The results demonstrate the versatility and power of diffusion models in various creative applications and image manipulation tasks.

Part 2.5: Class-Conditioned UNet with Classifier-Free Guidance

The final UNet adds class conditioning with classifier-free guidance (γ=5.0). This allows us to generate specific digits with improved quality. For each digit (0-9), we generate four different instances to demonstrate the model's ability to produce clear digits similar to MNIST dataset.

Training loss curve for class-conditioned UNet

Class-Conditioned Sampling Results

Each row shows four different generations of the same digit (0-9), demonstrating both consistency in digit identity and variation in style.

Epoch 5: Four samples of each digit (0-9).
Each row represents a digit, each column is a different sample.

Epoch 20: Four samples of each digit (0-9).
Note the improved clarity and consistency compared to epoch 5.

CS180: Project 5A - Fun with Diffusion Models!

Project Overview

Part 0: Setup and Initial Generation

Man Wearing Hat - Inference Step Comparison

Snowy Mountain Village - Inference Step Comparison

Rocket Ship - Inference Step Comparison

Part 1.1: Forward Process

Part 1.2: Classical Denoising

Part 1.3: One-Step Denoising

Part 1.4: Iterative Denoising

Denoising Steps

256x256 Resolution Comparison

Part 1.5: Diffusion Model Sampling

Part 1.6: Classifier Free Guidance

Part 1.7: Image-to-image Translation

Test Image (Campanile)

Web Image (Dog)

Web Image (Man)

Part 1.7.1: Editing Hand-Drawn and Web Images

Web Image (Optimus Prime)

Hand-Drawn Image 1 (Smiley Face)

Hand-Drawn Image 2 (Optimus Prime)

Part 1.7.2: Inpainting

Test Image Inpainting (Campanile)

Golden Gate Bridge Inpainting

Cow Inpainting

Part 1.7.3: Text-Conditional Image-to-image Translation

Campanile to Rocket Ship

Woman to Man

Forest to Snowy Mountain Village

Part 1.8: Visual Anagrams

Oil Painting Old Man / Campfire

Lithograph Waterfall / Skull

Mountain Village / Amalfi Coast

Part 1.10: Hybrid Images

Skull and Waterfall

Mountain Village and Amalfi Coast

Rocket Ship and Amalfi Coast

Conclusion

Part B: Training Your Own Diffusion Model

Part 1: Single-Step Denoising U-Net

Training Results

Out-of-Distribution Testing

Part 2: Time-Conditioned UNet

Sampling Results

Part 2.5: Class-Conditioned UNet with Classifier-Free Guidance

Class-Conditioned Sampling Results

Conclusion