mushroomclassifier's Introduction

Mushroom Edibility Classifier Using Data Analytics Methods

Dec 2021 | Data Analytics Project in TY BTech

Introduction

I have chosen the the problem statement of classifying mushrooms as edible and poisonous based on the UCI Mushroom Dataset.

Some of the visualizations done to understand the dataset are:

Number of poisonous and edible mushrooms in the dataset.
Visualize the distribution between various habitats and the edibility of mushrooms.
Population parameters of the instances of edible and poisonous mushrooms.
Plot a treemap showing the distribution of different gill colors of mushrooms.
Visualize the occurence of different ring types, and the number of rings on such mushrooms.
Test the accuracy of various classification models on the dataset to build a accurate prediction model for edibility of mushrooms.
Run the most accurate method of classification on an unseen dataset and cross-check it’s accuracy.

My approach to the problem is:

Using OneHotEncoding techniques to enable efficient classification on this particular dataset.
Applying Logistic Regression, Random Forest Classifier and Decision Tree algorithms to compare the accuracy.
Using a K-Fold Cross Validation method to check the consistency and fitting score of the algorithms used.
Choosing the approach with maximum accuracy and minimum standard deviation, and using it on the given test dataset to predict the classes of given mushrooms.

Recommend Projects