Giter VIP home page Giter VIP logo

airbnb-vs-hotels's Introduction

CX4242 project: Airbnb vs Hotels

Description

Our CX4242 project focuses on quantifying differences between Airbnb listings and hotels in the city of New York. The entire project consists of several different components:

  1. Data collection and scraping: We collected data from several sources, notably Airbnb, Amadeus (a travel IT company with an API for booking/pricing information), TripAdvisor, and OpenStreetMap.
  2. NLP analysis on reviews: We used the Stanford Core NLP library to segment reviews and perform sentiment analysis.
  3. Search engine: We compiled all Airbnb and hotel data into an ElasticSearch instance hosted on AWS, to be able to search across both datasets at once.
  4. Visualization UI: We summarized all of the data and analyses through an interactive webpage.

Our finalized datasets are stored in an AWS ElasticSearch instance, and our site is hosted with AWS ElasticBeanstalk.

Installation

Our project uses Python 3.5.

To install all Python dependencies used in this project, run

pip install -r requirements.txt

Execution - Instructions for Recreating our Project

Running the TripAdvisor scraper

First, use the appropriate repository by doing cd tripadvisor_scraper.

  • base_spider.py - This spider gets the necessary URLs (through TripAdvisor's autocomplete) for each city that we are searching for. This only needs to be run once, and it outputs to intermediate/urls.csv
  • listings_spider.py - Uses the URLs from the previous part to crawl for listings. Run with scrapy crawl listings -o listings.json
  • hotels_spider.py - Scrapes hotel amenities for each listing. Run with scrapy crawl hotels -o amenities.json
  • listings.json and amenities.json contain price, amenities, and some other basic information for the TripAdvisor search results.
  • reviews_spider.py - Scrapes review text for each listing obtained from the listings spider. Run with scrapy crawl reviews -a filename=<filename>, where the file is a CSV with TripAdvisor URLs for each hotel.

Running scripts to collect data from Amadeus

Scripts for collecting data from the Amadeus API are in the amadeus-api folder. In order to access the Amadeus API, sign for an API key. Then, set an environment variable for this key.

\\ On Unix-based systems:
export AMADEUS_KEY='your api key here'
\\ On Windows:
setx AMADEUS_KEY "your api key here"

We wanted to merge data from both TripAdvisor and Amadeus.

  • search.py - This script searches for hotels in Amadeus based off of the coordinates of hotels we've already scraped from TripAdvisor.
  • recordPrices.py - This script searches each hotel for prices across a range of dates.
cd amadeus-api
python search.py
python recordPrices.py

Downloading Basic Airbnb Listing and Reviews Data

The Airbnb listings are from Inside Airbnb.

Scraping Airbnb prices over time

  • data/scrape_airbnb_prices.py scrapes Airbnb prices for given listings on given dates.

Sample Data

To see some example data that we scraped/collected/merged, see the data folder.

Add data to ElasticSearch

In AWS, create a new ElasticSearch instance, with indices for hotels (all hotel data), airbnbs (Airbnb listing data), and airbnb_prices (Airbnb temporal data). See the data folder for more details about uploading.

Sentiment Analysis on Reviews

See the reviews analysis folder for more details.

Running the Web App

To run the web application, first set environment variables for ElasticSearch access keys:

export ES_KEY='your key ID here' // or setx ES_KEY "key ID" in Windows
export ES_SECRET='your secret here' // or setx ES_SECRET "secret" in Windows

Then, start the web application.

cd flask-app
python application.py

Navigate to localhost:8000 in your browser to see the site.

airbnb-vs-hotels's People

Contributors

farhantejani avatar kexin-zhang avatar ruyimarone avatar suprabhatgurrala avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.