Giter VIP home page Giter VIP logo

artwork-classification-using-ml's Introduction

ArtyLyze

Introduction

  • Since the advent of digitalization, millions of artworks have been digitized, opening up the world of art to countless new people. However, navigating this space for a layman is difficult due to the lack of metadata and contextual information needed to describe and understand the artwork.
  • Unless one knows title and artist of an artwork, finding the artwork is almost impossible. In this research project, we use both supervised and unsupervised models to generate metadata given an image of an artwork.
  • Specifically, given an artwork, we will try to predict the genre and style of the artist.
  • We have collected dataset of ~12k images which is a subset of MultitaskPainting100k.

Problem Definition

  1. Throughout history it has been observed that artistic collaborations fuel creativity and give rise to art movements. Our study aims to find correlations between different artistic styles spanning geographies and periods, which would help track the journey of art and how it evolved. Soft clustering approaches can help deduce influence of different factors (genre, time period, geography) on a particular work of art. Studying similarity and influence across time and geography between different art styles is a relevant research area.
  2. We wish to build a model that can classify an artwork according to its genre and style. If possible we would like to study the features the model learns that helps it differentiate between different kinds of artworks.

Dataset

Dataset Collection and Creation

Dataset of ~12k images which is a subset of MultitaskPainting100k. The original dataset has 100k images from 1508 different artists. But this datatset only contains artist name, style and genre hence we used Best Artworks of All Time to create a subset. Best Artworks of All Time contains 50 artists, 8671 paintings with artist name, genre, nationality and years, but there is only a one to one mapping between each painting, artist and genre. Hence, we created a subset data with artist name, genre, style, nationality and years by sampling images from the MultitaskPainting100k corresponding to 46 artists in Best Artworks of All Time. The final filtered subset has a total of 12,619 images split into train and test set of 8837 and 3782 images respectively.

Dataset Cleaning

Genre Classification

Our filtered dataset exhibited a class imbalance issue, hence we selected only the top 12 classes with highest represnetation and undersampled them to balance the dataset for genre classification.

Imbalance

Plot displaying the count of images in each class before balancing

Balance

Plot displaying the count of images in each class after balancing

Dataset

Dataset with images and corresponding labels

This balanced dataset contains a total of 2556 samples which are split into training and validation set of 1908 and 648 respectively.
Similarly we balanced our dataset for style classification and selected the top 13 classes with highest representation. These being:

  1. Art Nouveau (Modern)
  2. Baroque
  3. Cubism
  4. Expressionism
  5. High Renaissance
  6. Impressionism
  7. Mannerism (Late Renaissance)
  8. Naïve Art (Primitivism)
  9. Northern Renaissance
  10. Post-Impressionism
  11. Realism
  12. Romanticism
  13. Surrealism

Dataset Preprocessing

  • Unsupervised Classification

For our unsupervised classification we use the following methods for feature extraction:

  1. PCA
  2. Kernel PCA
  3. VGG16
  4. MobileNet

In each case we get a linear vector of reduced dimension. For the PCA techniques we retain 95% variance. Feature extraction is a very important step since that will help our model understand which features to focus on to define similarity between images and perform meaningful clustering. We run clustering algorithms on all the representations and discuss performance in further sections.

  • Supervised Classification

Since we want our model to generalize well and focus on important underlying features for style prediction we perform a few simple data augmentation practices like flipping and rotation along the vertical axis.

Data-Aug Data Augmentation

Each image in our dataset is an RGB image of size 180x180x3 where where 180 is the height and width of the image and 3 is the number of channels. The RGB channel values are in the [0, 255] range. This is not ideal for a neural network; in general we should seek to make our input values small. Hence, we standardize values to be in the [0, 1] range.

The final input to our model is a tensor of size (32, 180, 180, 3) where 32 is the batchsize.

Methods

We employ both supervised and unsupervised methods for this project.

Supervised classification:

Genre Classification

For supervised classification we use Convolutional Neural Networks, to understand if CNNs are able to capture the information, we experimented with a basic neural network with 3 convolutional layers and dropout. As expected the model was unable to perform well and produced very noisy results.

Conv

Accuracy and Loss curves for 3 layer CNN

Since our aim is to also achieve good results with good computational efficiency we decided to use the MobileNet architecture to build our prediction pipeline. Another reason for going with MobileNet being given its computational efficiency it will lead to faster experimentations which helps us better engineer the other parts of our pipeline like loss, optimizers, hyperparameters.

MobileNet

MobileNet Architecture

We further performed hyperparameter tuning to arrive at the learning rate of 1e-4 as a good approximation for the optimum value. In our first few experiments with MobileNet we froze the backbone which gave us consistent results across both training and validation sets.

Mobilefreeze

Results on MobileNet with frozen backbone

To improve performance we fine-tuned the MobileNet backbone with our dataset which boosted our accuracy on the testing set.

Mobilefine

Results on MobileNet with pretuned backbone

To improve the metrics further we performed experiments with DenseNet121 backbone. We decided to experiment with this backbone because it is a combination of both better performing and efficient model. DenseNet121 makes the deep learning networks go even deeper, but at the same time makes them more efficient to train, by using shorter connections between the layers.

DenseNet

DenseNet121 Architecture

For our DenseNet architecture we experimented with two values of learning rate 1e-3 and 1e-4 and found the model to perform the best with 1e-4 learning rate with a pretrained backbone.

DenseNet

DenseNet with lr = 1e-3

DenseNet

DenseNet with lr = 1e-4

DenseNet

DenseNet with pretrained backbone

Style Classification

For style classification we used the best model obtained from genre classfication and retrained it for the style classification task. This is done since both the tasks are very similar and involve the same data. Thus we can confidently say that it will perform better for the task of style classification as well compared to the other low performing models. The model thus used for style classification is DenseNet121.

Style

Unsupervised classification:

For unsupervised classification we use 2 methods KMeans and GMM to form different clusters across the paintings. For kmeans we obtain the optimal value for the number of clusters using the elbow plot.

ElbowPlot

Elbow Plot to find optimal number of clusters

After running these algorithms we obtain 25 clusters . As has been mentioned in Data Preprocessing, we ran our KMeans algorithm after running feature extraction with all the different methods. Clustering results from each of these methods is displayed below. \

KM_kernel

Clusters with Kernel PCA as feature extractor

KM_pca

Clusters with PCA as feature extractor

KM_vgg

Clusters with VGG16 as feature extractor

KM_mobile

Clusters with MobileNet as feature extractor

Few observations that can be noted from this are as follows :

  1. CNN models seem to be doing a better job at grouping similar objects together. For example, VGG16 groups structures (buildings, bridges, ports, huts, etc) in one cluster, faces and portraits in another cluster, landscapes in another cluster, abstract art etc. Basically , DNN models do a better job clustering objects visually. They also group different genres together.
  2. Since the features extracted by the different methods are different hence the clustering is also varied. \

Since we have established that VGG16 performs the best clustering hence we can do a deeper analysis of the different clusters thus obtained.

KM_21

KM_21_plot

Cluster 21 and its corresponding distribution

KM_24

KM_24_plot

Cluster 24 and its corresponding distribution

If we closely observe the clusters we can see that each of these clusters have similarities across their styles of painting, the classifier is able to understand the underlying pattern. We have plotted the graphs for these clusters displaying the distribution across nationality, time-period, artist, genre and style. As we can see each cluster has a rich combination of different artworks spanning different groups.

KMeans performs hard clustering hence it is difficult to understand if a particular painting has been exposed to multiple influences. Hence we use GMM to understand the influence of different styles on a particular painting. For this particular paintaing the highest responsibility value is 0.99 corresponding to cluster 8 and it can be clearly observed that the styles are very similar. The painting also has non-zero responsibilities for clusters 24 and 6 and the influence can be observed in the style. The corresponding distribution across all the different groups can also be observed for each of these clusters.

GMM Influence of different styles on one painting

GMM_8

GMM_8_plot

Main Cluster to which the image belongs and its distribution

GMM_6

GMM_6_plot

Cluster 6 and its corresponding distribution

GMM_24

GMM_24_plot

Cluster 24 and its corresponding distribution

Results and Discussion

Supervised

We compared the performance of our supervised models using accuracy, f1, precision, recall and Top-5 accuracy.

It can be clearly observed that a 3 layer CNN is extremely noisy and unable to learn well, its capacity of learning is lower. The MobileNet model performed significantly better than the convolution model with better values across all the metrics. To improve its performance we experimented with different learning rates and obtained the following results. Based on these results we obtained at the optimum learning rate of 1e-4. There was further improvement in the performance when we used DensetNet121 backbone and experimented with learning rate to obtain the optimum learning rate of 1e-3. The best results were obtained by using pretrained weights for the DenseNet121 model with learning-rate of 1e-3. Top-5 accuracy was used to compare the best performing models since we want to maximize the confidence of performing best on a select few popular genres.

Model Accuracy Precision Recall F1 Top-5 Accuracy
3 layer CNN 0.42 0.47 0.42 0.40
Mobilenet backbone (frozen); LR = 1e-4 0.57 0.59 0.57 0.58 0.74
Mobilenet backbone (frozen); LR = 5e-4 0.54 0.54 0.54 0.54 0.747
Mobilenet backbone (frozen); LR = 1e-3 0.44 0.68 0.44 0.53 0.71
Mobilenet backbone (fine-tuning) 0.64 0.64 0.64 0.64 0.79
DenseNet121 backbone (frozen); LR = 1e-3 0.58 0.59 0.58 0.58 0.924
DenseNet121 backbone (frozen); LR = 1e-4 0.61 0.62 0.61 0.61 0.924
DenseNet121 backbone (fine-tuning) 0.64 0.65 0.65 0.65 0.94

We select the best performing model here, DenseNet121 backbone (fine-tuning) and perform style classification as the downstream task to obtain the following results.

Model Accuracy Precision Recall F1 Top-5 Accuracy
DenseNet121 backbone (fine-tuning) 0.51 0.54 0.51 0.51 0.898

Images

Classification results on images

For our best performing model we are displaying the results for each genre. This helps us with further analysis on the best performing and worst performing genres.

Class-Names Precision Recall F1
abstract 0.75 0.83 0.79
cityscape 0.70 0.65 0.67
flower painting 0.88 0.81 0.85
landscape 0.61 0.70 0.66
mythological painting 0.74 0.46 0.57
nude painting 0.58 0.57 0.58
portrait 0.51 0.65 0.57
religious painting 0.44 0.50 0.47
self-portrait 0.58 0.61 0.59
sketch and study 0.67 0.72 0.70
still life 0.75 0.61 0.67
symbolic painting 0.58 0.54 0.56

If we look at the corresponding images for each genre we can see that the genres which are easy to classify have a very characteristic style which the CNN can understand easily. They also have clear features which can be easily understood. Whereas in cases where the models show poor performance example potraits and religious painting we can see that there is a significant overlap in the styles, confusing the model.

Best

Best performing classes for genre classification

Worst

Worst performing classes for genre classification

Unsupervised

For our unsupervised algorithms feature selection and distance function are very important measures. Our features should be representative our data and what we want the model to capture. Every clustering algorithm uses a different distance function, it is important to understand whether the distance function is a good measure to calculate the similarities between datapoints in the given dataset.
In our project we used silhouette coefficient and davies bouldin index to measure the performance of our clustering algorithms.

KMeans

Feature extractor Silhouette Davies Bouldin
Kernel PCA 0.0689 2.7367
PCA 0.0289 3.3776
VGG16 0.01057 3.9793
MobileNet 0.0307 3.4831

GMM

Feature extractor Silhouette Davies Bouldin
Kernel PCA -0.0700 3.188
PCA -0.043 4.3099
VGG16 0.0035 3.9184
MobileNet 0.0293 3.444

It is interesting to note that even though Kernel PCA gives us the best clustering scores , the clusters thus formed are not very relevant to our goal. In general all of our methods generate clusters that are very dispersed and not very seperated from other clusters. This means that the metricswe have used for unsupervised clustering are not very representative of achieving our clustering goal.

Conclusions

Following are a few conclusions we arrived at :

  1. In case of supervised classification the genres which are difficult to classify have overlapping tendencies with other genres hence confusing the model, but overall CNNs perform very well in understanding the patterns and classifying images accordingly.
  2. In unsupervised classification the feature extractors play a very important role in defining the clusters.
  3. We can clearly observe through the distribution of our clusters that artworks have a lot of influences guiding them. Cluster distributions span across nationalities, time-periods and artists. It is very interesting to note that artworks from famous painters like Vincent Van Gogh and Salvador Dali can be categorized under the same bucket.
  4. Through this project we can observe that although genre and style are a good way of classifying paintings, there is the possibility of new categories which we can shed light on higher correlation between artworks.

Codebase

  1. Dataset
  2. Supervised
  3. Unsupervised

Team

Anisha Pal, Avinash Prabhu, Meher Shashwat Nigam, Mukul Khanna, Shivika Singh

Work Distribution

Link to Gantt Chart

artwork-classification-using-ml's People

Contributors

shivika275 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.