This is the repository for Jacob Crabb and Taeho Jeon's Flatiron School end of module 1 project.
Our assignment is in 3 parts:
-
to use the kc_house_data.csv and it's description file, column_names.md, to make a model for predicting home sale prices.
-
to make a powerpoint/keynote/google slides presentation explaining our model
-
to write a blog concerning our project and post it for other aspiring data scientists to see.
A link to the blog on our project's visualisations: https://medium.com/@alludedwinter/visualizations-a-regressive-gene-235a4334276f
link to the slides:
we are to make a model that will predict home price based on costomer needs like number of bedrooms, number of bathrooms,location, and others.
we are provided two years of housing data from king county in the kc_house_data.csv file to work with. and are allowed to pull from other sources as needed.
we check for duplicates, correct missing or incorrect values, and remove outliers.
we build functions to perfom train/test splits and cross validation. then we test out different predictors with sklearn feature selection.
we test our model in general, on areas outside of Seattle, inside of Seattle and on each individual zip code. we then show the results for each area.
we build a usable predictor function based on our model. then add some final conclusions about our model.
it would be nice at some point to iron out the weaknesses of our model and make improvements to the code.