Giter VIP home page Giter VIP logo

tanishagupta15 / iris-data-analysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 172 KB

The Iris dataset analysis project involves loading and preprocessing the dataset, followed by exploratory data analysis with histograms, scatter plots, and a correlation heatmap. This project provides a comprehensive approach to analyzing and modeling the Iris dataset for classification.

Jupyter Notebook 100.00%

iris-data-analysis's Introduction

Iris dataset analysis - Classification

Dataset Information

The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. Download link: https://www.kaggle.com/uciml/iris

Step 1: Import Modules

Import libraries for data manipulation (pandas, numpy), visualization (matplotlib, seaborn), and machine learning (scikit-learn).

Step 2: Load Dataset and Derive Insights

  • Load the Iris dataset.
  • Display the first few rows to understand the structure.
  • Get dataset info (data types, null values).
  • Generate statistical summary (mean, median, standard deviation).

    Step 3: Preprocess the Dataset (Removing Null Values)

    Check for and remove any null values to ensure a clean dataset.

    Step 4: Exploratory Data Analysis (Histogram and Scatter Plot)

  • Create histograms for feature distribution.
  • Generate pair plots to visualize relationships between features, categorized by species.

    Step 5: Correlation Matrix (Heat Map)

    Compute and visualize the correlation matrix using a heat map to identify relationships between features.

    Step 6: Label Encoder

    Convert categorical species labels into numeric form using a label encoder for machine learning compatibility.

    Step 7: Model Training

  • Split data into features and target, then into training and testing sets.
  • Train models (Logistic Regression, K-Nearest Neighbors, Decision Tree).
  • Predict species on test data and evaluate model accuracy.
  • Compare accuracies to determine the best-performing model.

    Libraries

  • pandas
  • matplotlib
  • seaborn
  • scikit-learn

    Algorithms

  • Logistic Regression
  • K-Nearest Neighbors
  • Decision Tree

    Best Model Accuracy: 100.00

  • iris-data-analysis's People

    Contributors

    tanishagupta15 avatar

    Watchers

     avatar

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google โค๏ธ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.