census-income-project's Introduction

Census-Income-Project

Census Income Project using Classification Models - Logistic Regression, Decision Tree and Random Forest

Overview

This project focuses on exploring and predicting income information for over 48,000 individuals based on the 1994 US census data. The goal is to preprocess the data, perform exploratory data analysis (EDA), and build a predictive model to classify whether an individual makes over $50,000 a year or less using various machine learning algorithms.

Dataset

The dataset used in this project is sourced from the UCI Machine Learning Repository and contains information such as age, workclass, education, marital status, occupation, and more. For more details about the dataset, refer to Census Income Dataset.

Tools Used

NumPy Pandas Scikit-learn

Tasks

Exploratory Data Analysis (EDA):
- Investigate key insights in the data.
- Understand the distribution of income categories.
Data Cleaning:
- Handle missing values.
- Address outliers.
- Convert categorical variables to numerical.
Model Building:
- Use machine learning algorithms (Logistic Regression, Decision Tree and Random Forest) to predict income categories.
- Evaluate model performance.

Results

Logistic Regression Model Accuracy: 78.17%
Decision Tree Model Accuracy: 84.13%
Random Forest Model Accuracy: 84.51%

Conclusion

The Random Forest model outperforms other models in predicting income categories.

Recommend Projects

ttanshtanz / census-income-project Goto Github PK