Giter VIP home page Giter VIP logo

ricegen's Introduction

๐Ÿš RiceGen

๐ŸŒพ Diffusion Model applied to rice grains generation. ๐ŸŒพ

This is a simple academical project that aim to generate rice grains images using an unconditional diffusion model from scratch technique. Although being unconditional, the model is fully trained with similiar rice images, so we can say that is conditioned to generate rice grains.

Warning

Most of the image labels are incorrected, a mistaken were made when generating them. The only image with right labels is BS-64_E-120_IMGSIZE-16x16. For the others, consider the first image label for all of them. It'll be fixed as soon as possible.

Dataset and pre-processing

The chosen database was Rice Image Dataset, which provided images of five rice varieties (Arborio, Basmati, Ipsala, Jasmine and Karacadag), 15k images for each. As this project was executed within a Google Colab Jupyter Notebook, the size was reduced for 1k images for each variety. Using the following command:

ls | head -n 1000 | xargs -I {} mv {} ../<variety>-1k/

Once the data was shrinked, it was possible and feasible to use and upload to Drive. Inside Colab, after extracted the zip file that contained all the 5k images, was necessary to process these PIL images labeled into a PyTorch Dataset structure, using the CustomDataset and shuffling it. 4k (80%) was designed to train the model and the 20% to validation.

The architecture

It's worth to remember that this project is highly inspired and based on Diffusion Models - Live Coding Tutorial and Diffusion Models | Paper Explanation | Math Explained using their code as foundation, as the scratch implementation requires a certain level of expertise at math, mainly statistic.

The main goal is applying noise to the images using Markov Chain, this is the forward process. Once the model is trained, it can do the reverse process, turning noise into a coherent image. The architecture uses the U-Net in the processes which is very similar to Variational Auto Encoders (VAE).

NOTE:: I will not cover deeply the math under the hood. Just applying the concepts studied. If you want go further, feel free to read and watch the references.

Parameters

There were several tests with differents parameters values (batch size, image size, learning rate and number of epochs). You can see the results inside the 5k-samples. Among theses options, the image with the following parameters: BS-64_E-120_IMGSIZE-16x16 was the most similar to a real grain. The BS-64_E-120_IMGSIZE-32x32_LR-00003 (with learning rate = 0.0003) presented a good trail as well.

Results

Altough being limited by the Colab runtime resources, great images were accomplished. The image below has the followings parameters: Batch size = 64, Epochs = 120, Resolution = 16x16 and Learning Rate = 0.0003.

When trying to increase the resolution, it's clearly seen that the model get instable and try to generate more grains, which sometimes leads to an undesired image.

Here's how the loss function behaved in the first image (16x16 resolution).

References

  1. dtransposed. (2023, February 6). Introduction to Markdown [Video]. YouTube. Retrieved March 8, 2024, from https://www.youtube.com/watch?v=S_il77Ttrmg.

  2. Outlier. (2022, June 6). Diffusion Models | Paper Explanation | Math Explained [Video]. YouTube. Retrieved March 8, 2024, from https://www.youtube.com/watch?v=HoKDTa5jHvg.

  3. Jia-Bin Huang. (2024, January 8). Diffusion Models | Paper Explanation | Math Explained [Video]. YouTube. Retrieved March 8, 2024, from https://www.youtube.com/watch?v=i2qSxMVeVLI.

  4. Luo, C. (2022). The Effectiveness of Markdown in Document Creation. Arxiv. https://arxiv.org/pdf/2208.11970.pdf

  5. Ho, J., Jain, A., Abbeel, P. (2020). Denoising Diffusion Probabilistic Model. Arxiv. https://arxiv.org/pdf/2006.11239.pdf

ricegen's People

Contributors

edupras avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.