Giter VIP home page Giter VIP logo

sidgairo18 / unsupervised-style-learning Goto Github PK

View Code? Open in Web Editor NEW
12.0 1.0 3.0 2.63 MB

This repository contains the source code, models and data files for the work titled: "Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks". This was accepted at WACV 2020. The source code files, model checkpoints and data will be made available after the conference.

License: MIT License

Python 100.00%
wacv wacv2020 python3 pytorch computer-vision imagestyle machine-learning

unsupervised-style-learning's Introduction

Unsupervised Style Learning (2020)

This repository contains the source code, models and data files for the work titled: Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks (accepted at WACV 2020).

Please visit our project page for more details: https://sidgairo18.github.io/style

Dependencies

* Python3
* PyTorch (and other dependencies for PyTorch)
* Numpy
* OpenCV 3.3.0
* Visdom Line Plotter
* tqdm
* cudnn (CUDA for training on GPU)

These are all easily installable via, e.g., pip install numpy. Any reasonably recent version of these packages should work. It is recommended to use a python virtual environment to setup the dependencies and the code.

Running the code

Training Dataset Construction

Feature Extraction

For feature extraction using pre-trained VGG network and PCA reduction use the following repo.

Clustering Feature extraction is followed by KMeans clustering. The optimal number of clusters for each dataset are determined using the elbow method.

Training with Classification Loss (Stage 1)

  • We train a CNN with augmented by a 256-dimensional bottleneck layer
  • The training proceeds for 30 epochs and minimize cross-entropy loss for multi-class classification.
  • We stop this after 30 epochs and the weights are saved which are used later in Stage 2.
  • During this stage, we simply use the cluster ID for each image as its class label.
  • Hyperparameters: lr = 0.001, Adam optimizer, Categorical Cross-Entropy (These are emperically chosen).
  • Python script for this part is classification_net.py

Training with Triplet Loss (Stage 2)

  • Stage 2 of the pipeline, requries training a Triplet Convnet with Triplet Loss (MarginRanking loss)
  • For this we require an anchor image, a positive sample and a negative sample. (How these images are sampled is explained in section 3.1.2 of the Paper)
  • We train this triplet network for 50 epochs (Hyperparameters: lr = 0.001, SGD optimizer, MarginRanking Loss).
  • The model weights from Stage 1 are loaded before the training for Stage 2 is started.
  • Python script for this part is train.py
  • For more information on the Triplet Network and embedding networks, take a look at networks.py and triplet_network.py files

Note 1: The bottle neck layer chosen has 256 dimensions (from experiments it was seen 256 dimensions instead of 128 makes not much difference in performance).

Note 2: The code may be slightly different from the parameters mentioned in the paper, but is sufficient to reproduce the results given in the paper (this is based on my unofficial implmentation of the work and it's public code.

Running the Training Procedure

  • For Stage 1, run python classification.py
  • For Stage 2, run python train.py
  • For details on the data-loader and data text files see next section.

Data files in /data folder

  • the classification_dataloader expects 2 files: filenames_filename and labels_filename.
  • filenames_filename => A text file with each line containing a path to an image, e.g., images/class1/sample.jpg
  • labels_filename => A text file with each line containing 1 integer, label index of the image.
  • Similarly the triplet_dataloader expects 2 files: filenames_filename and triplets_filename.
  • filenames_filename => A text file with each line containing a path to an image, e.g., images/class1/sample.jpg
  • triplets_filename => A text file with each line containing 3 integers, where integer i refers to the i-th image in filenames_filename. For a line with integers "a b c", a triplet is defined such that image a is more similar to image c than it is to image b.

Dataset

For the information on the dataset and splits used please go over Sec 4 of the paper, and supplementary material.

The datasets used are:

  1. BAM: Behance Artistic Media dataset, we use a subset of BAM dataset with 121K images (sampled similar to Behance-Net-TT 110K as in work) balanced across media and emotional styles, and with a Train:Val:Test split as 80:5:15.
  2. AVA Style Dataset Train:Val:Test split 85:5:10
  3. Flickr: Train:Val:Test split 60:20:20
  4. Wikipaintings: Train:Val:Test split 85:5:10
  5. DeviantArt: Train:Val:Test split 85:5:10
  6. WallArt

Accessing the Datasets

  • 1 to request access to the dataset please visit the BAM website here
  • 2, 3, 4 can be downloaded from here.
  • 5, 6 have not been released publically yet due to licensing issues. But can be easily recreated as described in the paper and can be accessed at the respective websites.

Feature Visualization

t-sne for feature visualizations

End Notes and Disclaimer:

  • The different dataset images have not been included.
  • The text files in the data folder are just for reference. They may vary according to your own data files.
  • To request access to the Dataset please visit the BAM website and refer to the notes under Dataset section.
  • Feel free to use this code for your own work, but please cite the work if using this work officially.
  • In case of any bugs or errors, please be gracious enough to report an issue on this repo.

To cite this work:

@InProceedings{Gairola_2020_WACV,
author = {Gairola, Siddhartha and Shah, Rajvi and Narayanan, P. J.},
title = {Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}
}

License

We distriute the source code under the MIT License.

unsupervised-style-learning's People

Contributors

sidgairo18 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

peternara

unsupervised-style-learning's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.