This project involves analyzing and predicting housing prices for a Real Estate Investment Trust (REIT). The goal is to determine the market price of a house based on various features such as square footage, number of bedrooms, number of floors, and other relevant attributes. The project employs data analysis and machine learning techniques to achieve accurate predictions.
-
Data Exploration and Cleaning:
- Explore the dataset to understand its structure and features.
- Handle missing values, outliers, and inconsistencies in the data.
-
Feature Engineering:
- Create new features that might enhance predictive power.
- Explore relationships between existing features and the target variable.
-
Data Visualization:
- Use visualizations to understand the distribution of the target variable and relationships with other features.
- Identify patterns, correlations, and potential outliers.
-
Data Preprocessing:
- Split the data into training and testing sets.
- Scale or normalize numerical features.
-
Model Selection:
- Choose a regression model suitable for predicting housing prices.
- Train the model using the training dataset.
-
Model Evaluation:
- Evaluate the model's performance using the testing dataset.
- Utilize metrics such as Mean Squared Error (MSE) or Root Mean Squared Error (RMSE).
-
Hyperparameter Tuning:
- Fine-tune the model parameters to optimize performance.
-
Prediction:
- Use the trained model to predict housing prices for new data.
For this project, we experimented with various regression models, including:
- Linear Regression
- Decision Trees
- Random Forests
- Gradient Boosting
The model selection was based on their performance metrics and suitability for the dataset.
We evaluated the models using metrics such as Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). These metrics provide insights into the accuracy of our predictions and help us compare different models.
- Present key findings, insights, and visualizations from the analysis.
- Highlight any patterns or relationships discovered between features and housing prices.
- Pandas
- Numpy
- sklearn
- matplotlib
- seaborn
- Teja Niduram