Diffusion Model for Image Generation

This project implements a Diffusion Model(DDPM) for image generation using PyTorch. The model is trained on the Fashion MNIST dataset and can generate new fashion item images.

This implementation of DDPM is originally from Huggingface

Overview
Requirements
Project Structure
Installation
Usage
Model Architecture
Training Process
Sampling Process
Results
Contributing
License

Overview

Diffusion models are a class of generative models that learn to gradually denoise a completely noisy image. This project implements such a model using a U-Net architecture with attention mechanisms. The model is trained on the Fashion MNIST dataset and can generate new fashion item images.

Requirements

The project requires the following main libraries:

Python 3.7+
PyTorch 1.7+
torchvision
einops
tqdm
matplotlib
datasets

For a complete list of requirements, see the requirements.txt file.

Project Structure

diffusion_model/
│
├── main.py
├── requirements.txt
│
├── src/
│   ├── init.py
│   ├── model.py
│   ├── dataset.py
│   ├── diffusion.py
│   ├── train.py
│   ├── sample.py
│   └── utils.py
│
└── results/

main.py: The entry point of the program.
src/model.py: Contains the U-Net model architecture.
src/dataset.py: Handles dataset loading and preprocessing.
src/diffusion.py: Implements the diffusion process.
src/train.py: Contains the training loop.
src/sample.py: Implements the sampling process.
src/utils.py: Contains utility functions.
results/: Directory where generated images are saved.

Installation

Clone this repository:

git clone https://github.com/yourusername/diffusion-model.git
cd diffusion-model

Install the required packages:

   pip install -r requirements.txt

Usage

To train the model and generate images:

python main.py

This will start the training process and periodically save generated images in the results/ directory.

Model Architecture

The model uses a U-Net architecture with the following key components:

Residual blocks
Group normalization
Self-attention mechanisms
Sinusoidal position embeddings for time steps

The U-Net consists of a series of downsampling layers followed by upsampling layers, with skip connections between corresponding layers.

Training Process

The training process follows these steps:

Load and preprocess the Fashion MNIST dataset.
For each epoch and batch:
- Sample a random timestep t.
- Add noise to the input images according to t.
- Predict the noise using the model.
- Calculate the loss between predicted and actual noise.
- Update the model parameters.
Periodically save generated samples.

Sampling Process

The sampling process to generate new images involves:

Start with pure noise.
Iteratively denoise the image using the trained model.
For each timestep from T to 1:
- Predict the noise in the current noisy image.
- Remove a portion of the predicted noise.
- Add a small amount of random noise (except at the final step).

Results

After training, the model can generate new fashion item images. Examples of generated images can be found in the results/ directory.

Contributing

Contributions to this project are welcome. Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

eric98040 / ddpm-pytorch Goto Github PK

ddpm-pytorch's Introduction

Diffusion Model for Image Generation

Table of Contents

Overview

Requirements

Project Structure

Installation

Usage

Model Architecture

Training Process

Sampling Process

Results

Contributing

License

ddpm-pytorch's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent