Giter VIP home page Giter VIP logo

styletransfer's Introduction

styletransfer

This project was developed for the Vision And Perception exam, academic year 2021/2022, University La Sapienza of Rome.

Getting Started

The aim of this project is to perform Style Transfer with 2 different approaches.
The first one is based on Neural Style Transfer while the second one is based on Generative Adversarial Networks (GANs).
For this task the idea is to modify faces images and panoramas with Van Gogh pictures.

GAN Style Transfer

There are 2 datasets for this task. The first one is a web scraped set of human faces of different ages,races and creeds downloaded from Kaggle. The second one is a set of Van Gogh Paintings collected from the Berkeley university.
For this task is used CycleGAN. We adopt the architecture for our generative networks who have shown impressive results. It is possible to see that there are 2 generators and 2 discriminators because we have 2 different domains and it is not possible in this case to use one single couple generator-discriminator.
The incredible thing is that the same network was used also for Panoramas' Style Transfer with good results. Also in this case the set of panoramas was downloaded from Berkeley.

Neural Style Transfer

This approach was inspired by the works [2] and [3]. The model was trained on the human faces dataset, used also on the other task. Only one style image was used. To learn the content of the image to transform the algorithm minimizes the difference of features extracted at various depths of a pretrained VGG16 model, between said image and the produced one. To learn the style of the reference image instead, the model minimizes the norm of the difference between Gram matrices created from features extracted again from layers at various depths of the same VGG16 model. Since the model had trouble learning to color correctly the produced images a second model was developed that worked with gray images. Color is later added from the original by converting the images from RGB to the LAB color space.

Installing

In this repository there are 5 Colab Notebooks. All the code can be runned using Google Drive on every browser.
You have simply to paste all the Notebooks in your Drive.
The main notebook is MainNotebook. In this notebook is possible to find a setup cell. This cell must be runned only the first time and then must be commented. Using this cell is possible to download all the datasets that are required. In this way is not necessary to download the datasets by hand but all is done automatically. With this setup also some pretrained models are downloaded and in this way is possible to use them only for testing and to see the results.
Is also possible to use the other 4 Notebooks for training but but is descouraged because the execution time is about 4 hours of GPU for every Notebook to achieve results close to our pretrained models.

Results

GAN Style Transfer

Is possible to see the results in two ways. In the first one we run the TrainingNotebookFaces or TrainingNotebookPanoramas (depending on the task) and we wait to train our network, if for instance we want to change something. But we can also open MainNotebook and use the pretrained models provided by us. Is possible to see how the loss decreases over time and we can see the modified pictures both for the panoramas and for the human faces.

Neural Style Transfer

The model can be trained as shown in TrainingNotebookNeural and TrainingNotebookNeuralGray. Pretrained models are also available and MainNotebook shows how to use them. The advantage of the approach from [3] as opposed to [2] is that the model, during inference, can be called as any other model and it does not need to be trained every time a new image is produced.

Authors

References

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

@inproceedings{CycleGAN2017,
 title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks},
	
 author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
	
 booktitle={Computer Vision (ICCV), 2017 IEEE International Conference on},
	
 year={2017}
}

A Neural Algorithm of Artistic Style

@misc{https://doi.org/10.48550/arxiv.1508.06576,
  doi = {10.48550/ARXIV.1508.06576},
 
  url = {https://arxiv.org/abs/1508.06576},
 
  author = {Gatys, Leon A. and Ecker, Alexander S. and Bethge, Matthias},
  
  title = {A Neural Algorithm of Artistic Style},
 
  publisher = {arXiv},
 
  year = {2015},
 
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

@misc{https://doi.org/10.48550/arxiv.1603.08155,
  doi = {10.48550/ARXIV.1603.08155},
 
  url = {https://arxiv.org/abs/1603.08155},
 
  author = {Johnson, Justin and Alahi, Alexandre and Fei-Fei, Li},
 
  title = {Perceptual Losses for Real-Time Style Transfer and Super-Resolution},
 
  publisher = {arXiv},
 
  year = {2016},
 
  copyright = {arXiv.org perpetual, non-exclusive license}
}

styletransfer's People

Contributors

antoniopurificato avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.