Census Income Project using Classification Models - Logistic Regression, Decision Tree and Random Forest
This project focuses on exploring and predicting income information for over 48,000 individuals based on the 1994 US census data. The goal is to preprocess the data, perform exploratory data analysis (EDA), and build a predictive model to classify whether an individual makes over $50,000 a year or less using various machine learning algorithms.
The dataset used in this project is sourced from the UCI Machine Learning Repository and contains information such as age, workclass, education, marital status, occupation, and more. For more details about the dataset, refer to Census Income Dataset.
NumPy Pandas Scikit-learn
- Exploratory Data Analysis (EDA):
- Investigate key insights in the data.
- Understand the distribution of income categories.
- Data Cleaning:
- Handle missing values.
- Address outliers.
- Convert categorical variables to numerical.
- Model Building:
- Use machine learning algorithms (Logistic Regression, Decision Tree and Random Forest) to predict income categories.
- Evaluate model performance.
- Logistic Regression Model Accuracy: 78.17%
- Decision Tree Model Accuracy: 84.13%
- Random Forest Model Accuracy: 84.51%
The Random Forest model outperforms other models in predicting income categories.