βThe purpose of (scientific) computing is insight, not numbers.β
Richard Hamming
In this project we will apply different data science and machine learning techniques for classification over a custom dataset of annotated books to predict the book category. The practical implications of such classificator could be in the domain of automated sorting for libraries and bookstores which can use such a system to automatically categorize new books, making inventory management more efficient. Other practical use can be in recommendation systems where this system can feed into recommendation engines, suggesting books to customers based on their past preferences. We will use annotated data for the books from biblioman.chitanka.info, a sub project to chitanka.info dedicated in making an extensive annotated database for the books. We will make EDA over the dataset to identify possible usefull features, and we will make new features as well. We will then train and evaluate with cross validation a number of classical machine learning models - Logistic Regression, Decision Tree, Random Forest and SVC to identify the convinient architecture and hyperparameters. When the best model is selected and trained we will inspect its performance over seen and unseen data. In the end we hope to be able to "Judge the book by its cover" using the Open AI API for GPT 4.
To get started just open Classification_of_books_with_multimodal_data.ipynb notebook!
Have fun!