Giter VIP home page Giter VIP logo

books-classification-with-multimodal-data's Introduction

Classification of books with multimodal data over custom dataset

"Judge the book by its cover"

β€œThe purpose of (scientific) computing is insight, not numbers.”

Richard Hamming

Summary

In this project we will apply different data science and machine learning techniques for classification over a custom dataset of annotated books to predict the book category. The practical implications of such classificator could be in the domain of automated sorting for libraries and bookstores which can use such a system to automatically categorize new books, making inventory management more efficient. Other practical use can be in recommendation systems where this system can feed into recommendation engines, suggesting books to customers based on their past preferences. We will use annotated data for the books from biblioman.chitanka.info, a sub project to chitanka.info dedicated in making an extensive annotated database for the books. We will make EDA over the dataset to identify possible usefull features, and we will make new features as well. We will then train and evaluate with cross validation a number of classical machine learning models - Logistic Regression, Decision Tree, Random Forest and SVC to identify the convinient architecture and hyperparameters. When the best model is selected and trained we will inspect its performance over seen and unseen data. In the end we hope to be able to "Judge the book by its cover" using the Open AI API for GPT 4.

To get started just open Classification_of_books_with_multimodal_data.ipynb notebook!

Have fun!

books-classification-with-multimodal-data's People

Contributors

preslaff avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.