Giter VIP home page Giter VIP logo

ds-naive-bayes-mini-project's Introduction

Classification using Naive Bayes

In the mini-project, you'll learn the basics of text analysis using a subset of movie reviews from the rotten tomatoes database. You'll also use a fundamental technique in Bayesian inference, called Naive Bayes. This mini-project is based on Lab 10 of Harvard's CS109 class. Please free to go to the original lab for additional exercises and solutions.

We do feature extraction using Vectorizer methods of sklearn, train and testing model and compute performance metrics as F1 score, Recall, Precition, ROC curves. We apply cross-validation, and iteration over the hyper-parameteres of every model used, to maximize the likelihood in the case of Naive Bayes and F1 score in Random Forest.

Additionaly, we apply some techniques to normalized the corpus of reviews used, as the following: lemmatization, delete stop-words, delete special characters and tokenization using one of the most popular libraries in Natural Language Processing: nltk.

Getting Started

Prerequisites

  • This notebook loads ./critics.csv locates in the same folder.
  • Download Anaconda.
  • Run Anaconda Navigator and launch a jupyter notebook and open the file Mini_Project_Naive_Bayes.ipynb
  • Install request package from Terminal.

Other libraries applied in this project (numpy, scipy, matplotlib, pandas, seaborn, sklearn, six.moves) do not require installation (default packages in anaconda). They only need to be imported in the notebook.

Installing

pip install --user -U nltk

ds-naive-bayes-mini-project's People

Contributors

daesparz avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.