In this project, I worked on cleaning and analyzing the Airbnb dataset. Also, I applied different machine learning algorithms like Logistic Regression and XGBoost model to find which model provide highest accuracy with the validation dataset.
Problem Statement: Your job is to predict high_booking_rate (how popular the listing is) for Airbnb.com listing that a listing owner might want to know. You are provided you with about 70 features for each listing (full descriptions of these variables are given in the data dictionary, which is also provided).
There are three data sets posted on Canvas:
airbnb_train_x.csv: features for the training instances. airbnb_train_y.csv: labels for the training instances. airbnb_test_x.csv: features for the test instances. Your goal is to make predictions for the instances in the test set.
high_booking_rate is binary. The winning team will have the binary predictions with the highest accuracy.