Giter VIP home page Giter VIP logo

mod_3_project_cuisines's Introduction

Regional Cuisine Classifier

Motivation

We wanted to see if there was a way to classify different recipes’ cuisine types by region, just by taking in a list of ingredients. Our goal was to create a model in which by inputting ingredients in a function, it will give an output of which possible region the ingredients belong to. Companies such as Yelp, GrubHub and Seamless can use this type of model to help classify food based on ingredients or dish name.


Sources:

We web scrapped the following websites for recipe ingredients. All scrapped ingredients had a tag of which region it belongs to:

BBC Food, and Epicurious


EDA of our DataFrames:

DataFrame Head: This just shows the top five rows of our dataframe. Cuisine is our Target (y) variable while recipe_ingredients is our independent variable.

screen shot 2019-01-09 at 10 06 07 pm

Below is a bar graph showing all features used and how many recipe each feature had. They are relatively evenly distributed, with low class imbalance

screen shot 2019-01-09 at 10 06 30 pm

And lastly, a sample of word cloud showing how unique each region recipe ingredients are:

screen shot 2019-01-09 at 10 06 45 pm


Cleaning Up the Data

After EDA, we cleaned up our data. This meants get rid of common stop words that are already in the NLP English Corpus, in addition to punctuations and additional common words that show up in all recipes, such as units of measurements. After cleaning, we looked at the bi-gram as many recipe ingredients such as oil has another word paired with it that may make it uniue to a region.

screen shot 2019-01-09 at 10 22 38 pm


Data Analysis

Then we looked at a ROC/AUC (Reciever Operating Characteristics/ Area Under Curve):

screen shot 2019-01-09 at 10 26 10 pm

Here is the SVM Confusion Matrix that showed how well our model predicted correctly. Vertical is the actual target and horizontal is the prediction. While there are some incorrect predictions, they also appear to share many common recipe based on regional and/or cultural similarities

screen shot 2019-01-09 at 10 29 01 pm


Model Results

We made multiple models with multiple GridSearch CV parameters to get the best model possible. After multiple model run and hyper parameter tuning we found that Random Forest tuned with GridSearch works best. Our model predicts with a nearly ~70% accuracy. Given that this is a multiclass classification model based on text data, this is excellent. If one were to pick a region based on recipe ingredients, their accuracy would be 100 divided by the numbers of classes we are trying to predict. So it is 100/14 which is 7.14%. Here is a basic picture of all our model and their accuracy score. All actual work can be found in the Final_Compiled notebook file

screen shot 2019-01-09 at 10 34 09 pm

Thank you for reading.

mod_3_project_cuisines's People

Contributors

imamun93 avatar

Stargazers

 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.