books-classification-with-multimodal-data's Introduction

Classification of books with multimodal data over custom dataset

"Judge the book by its cover"

“The purpose of (scientific) computing is insight, not numbers.”

Richard Hamming

Summary

In this project we will apply different data science and machine learning techniques for classification over a custom dataset of annotated books to predict the book category. The practical implications of such classificator could be in the domain of automated sorting for libraries and bookstores which can use such a system to automatically categorize new books, making inventory management more efficient. Other practical use can be in recommendation systems where this system can feed into recommendation engines, suggesting books to customers based on their past preferences. We will use annotated data for the books from biblioman.chitanka.info, a sub project to chitanka.info dedicated in making an extensive annotated database for the books. We will make EDA over the dataset to identify possible usefull features, and we will make new features as well. We will then train and evaluate with cross validation a number of classical machine learning models - Logistic Regression, Decision Tree, Random Forest and SVC to identify the convinient architecture and hyperparameters. When the best model is selected and trained we will inspect its performance over seen and unseen data. In the end we hope to be able to "Judge the book by its cover" using the Open AI API for GPT 4.

To get started just open Classification_of_books_with_multimodal_data.ipynb notebook!

Have fun!

Recommend Projects

preslaff / books-classification-with-multimodal-data Goto Github PK

books-classification-with-multimodal-data's Introduction

Classification of books with multimodal data over custom dataset

"Judge the book by its cover"

Summary

books-classification-with-multimodal-data's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent