Adnan Aman
In this project, we explore the capabilities of diffusion models through various implementations including sampling loops, inpainting, visual anagrams, and hybrid image generation. Using the DeepFloyd IF model, we demonstrate advanced image manipulation techniques and creative applications.
Using DeepFloyd IF model with different prompts and inference steps.
Random seed used: 180. Comparing generation quality between 20 and 30 inference steps.Implementation of the forward process adding noise to images at different timesteps.
Using Gaussian blur filtering for denoising at different noise levels.
Single-step denoising using UNet predictions at different noise levels.
Progressive denoising showing all steps and comparison with other methods.
Generating images from scratch using DeepFloyd model.
Image generation with CFG scale γ=7
Using the SDEdit algorithm to project images back onto the natural image manifold with different noise levels. Using prompt "a high quality photo" for base projections.
Projecting non-realistic images onto the natural image manifold using different noise levels.
Using RePaint algorithm to fill in masked regions of images while preserving surrounding context.
Using text prompts to guide the image projection process.
Prompt: "a rocket ship"
Prompt: "a photo of a man"
Prompt: "an oil painting of a snowy mountain village"
Creating optical illusions that change appearance when flipped upside down using paired prompts.
Creating images that appear different at varying distances using frequency separation.
Through this project, we've explored various applications of diffusion models including:
The results demonstrate the versatility and power of diffusion models in various creative applications and image manipulation tasks.
Following our exploration of DeepFloyd IF, we now implement our own diffusion models through three progressive stages: unconditioned, time-conditioned, and class-conditioned UNet architectures.
We begin with a simple UNet architecture that performs one-step denoising.
We enhance our model by adding time conditioning, allowing the network to handle different noise levels more effectively. This is achieved by injecting timestep information through FCBlocks at key points in the UNet architecture.
The final UNet adds class conditioning with classifier-free guidance (γ=5.0). This allows us to generate specific digits with improved quality. For each digit (0-9), we generate four different instances to demonstrate the model's ability to produce clear digits similar to MNIST dataset.
Each row shows four different generations of the same digit (0-9), demonstrating both consistency in digit identity and variation in style.
Through this project, we've explored the evolution of diffusion models from basic denoising to conditional generation: