tanishagupta15 / iris-data-analysis Goto Github PK

View Code? Open in Web Editor NEW

The Iris dataset analysis project involves loading and preprocessing the dataset, followed by exploratory data analysis with histograms, scatter plots, and a correlation heatmap. This project provides a comprehensive approach to analyzing and modeling the Iris dataset for classification.

Jupyter Notebook 100.00%

iris-data-analysis's Introduction

Iris dataset analysis - Classification

Dataset Information

The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. Download link: https://www.kaggle.com/uciml/iris

Step 1: Import Modules

Import libraries for data manipulation (pandas, numpy), visualization (matplotlib, seaborn), and machine learning (scikit-learn).

Step 2: Load Dataset and Derive Insights

Load the Iris dataset.

Display the first few rows to understand the structure.

Get dataset info (data types, null values).

Generate statistical summary (mean, median, standard deviation).

Step 3: Preprocess the Dataset (Removing Null Values)

Check for and remove any null values to ensure a clean dataset.

Step 4: Exploratory Data Analysis (Histogram and Scatter Plot)

Create histograms for feature distribution.

Generate pair plots to visualize relationships between features, categorized by species.

Step 5: Correlation Matrix (Heat Map)

Compute and visualize the correlation matrix using a heat map to identify relationships between features.

Step 6: Label Encoder

Convert categorical species labels into numeric form using a label encoder for machine learning compatibility.

Step 7: Model Training

Split data into features and target, then into training and testing sets.

Train models (Logistic Regression, K-Nearest Neighbors, Decision Tree).

Predict species on test data and evaluate model accuracy.

Compare accuracies to determine the best-performing model.

Libraries

pandas

matplotlib

seaborn

scikit-learn

Algorithms

Logistic Regression

K-Nearest Neighbors

Decision Tree

Best Model Accuracy: 100.00

Recommend Projects

tanishagupta15 / iris-data-analysis Goto Github PK

iris-data-analysis's Introduction

Iris dataset analysis - Classification

Dataset Information

Step 1: Import Modules

Step 2: Load Dataset and Derive Insights

Step 3: Preprocess the Dataset (Removing Null Values)

Step 4: Exploratory Data Analysis (Histogram and Scatter Plot)

Step 5: Correlation Matrix (Heat Map)

Step 6: Label Encoder

Step 7: Model Training

Libraries

Algorithms

iris-data-analysis's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent