Giter VIP home page Giter VIP logo

goodreads-books-analysis's Introduction

Goodreads-Books-Analysis

A data science project to scrape Goodreads data and run a classifier to categorize books as fiction or nonfiction based on the description.

Data Collection:

The Goodreads books data are used as the source of my dataset, as the amount of information they had on their book pages was very comprehensive and in a mostly standard format. I decided to ‘scrape’ the following pieces of information:

Title

Description

Authors

Edition

Format

ISBN

No. of pages

Rating

No. of ratings

No. of reviews

Genres

Book cover image

One of the most popular and easy-to-use packages in Python to collect static data from web pages is BeautifulSoup, which is used in this. The training data was collected from Goodreads' best books ever list found here: https://www.goodreads.com/list/show/1.Best_Books_Ever. The test data used to validate the model was collected from the list of best books of 2018, found here: https://www.goodreads.com/list/best_of_year/2018?id=119307.Best_books_of_2018

Scripts used to scrape data from Goodreads. They should be run in this order:

  1. URL collector

  2. Data collector

  3. Image collector

In case if you don't want to do web scraping for data and want data directly for your model. Then you can download data from kaggle:

Best-books-ever: https://www.kaggle.com/datasets/meetnaren/goodreads-best-books?select=book_data.csv

Best-books-of-2018: https://www.kaggle.com/datasets/meetnaren/goodreads-best-books-of-2018?select=book_data.csv

Data Exploration:

Data Exploration of books genres, covers, authors etc is performed for better understanding of the data and understanding the trends in data which can help us in classification.

Models:

Two Recurrent neural network classifiers are used, one is Keras RNN classifier which gives accuracy of around 94% and the other is Pytorch RNN classifier which gives accuracy of around 95%. Both the models are good, but Pytorch RNN classifier is slightly better than Keras RNN classifier.

Book Recommender System:

A simple content based Book recommender system based on author names and genres.

goodreads-books-analysis's People

Contributors

guptapriya-83900 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.