Giter VIP home page Giter VIP logo

funda-scraper's Introduction

funda-scraper

Scraper of the Dutch real estate website Funda.nl, written in Python Scrapy.

Basic usage

There are two spiders:

  1. funda_spider scrapes all properties for sale in a certain city, such as http://www.funda.nl/koop/amsterdam/,
  2. funda_spider_sold scrapes data on properties which have recently been sold, such as those listed on http://www.funda.nl/koop/verkocht/amsterdam/.

After installing Scrapy, in the project directory simply run the command

scrapy crawl funda_spider -a place=amsterdam -o amsterdam_for_sale.json

to generate a JSON file amsterdam_for_sale.json with all houses for sale listed on http://www.funda.nl/koop/amsterdam/ and its subpages. The keyword argument place can be used to scrape data from other cities; for example place=rotterdam will scrape data from http://www.funda.nl/koop/rotterdam/.

For recently sold homes, run

scrapy crawl funda_spider_sold -a place=amsterdam -o amsterdam_sold.json

to generate an amsterdam_sold.json with data from http://www.funda.nl/koop/verkocht/amsterdam/. Alternatively, CSV output can be generated by typing amsterdam_sold.csv extension instead of amsterdam_sold.json.

Analysis

The scraped data contains the following fields: address, postal_code, year_built, area, rooms, bedrooms, and price, the asking price. For sold homes, the JSON will include posting_date and sale_date. These properties can be further analyzed using Python Pandas, for example. A couple of applications are shown below.

Mapping of real estate attributes

By applying geolocation to the addresses, attributes such as price per unit area can be mapped (Figure 1). Such attributes can be used for 'bargain hunting' by identifying outliers.

Heat map of property price per unit area Legend
Figure 1. Price per unit area (EUR/m2) of houses for sale in Amsterdam on 18 July 2016, plotted using OpenHeatMap. (Due to a quotum on the number of geolocation requests per individual address, geolocation was performed by grouping properties by the first 4 digits of their postal codes and using a downloaded database of their coordinates; this is why the 'blobs' are unevenly distributed).

An interesting observation from Figure 1 is the clear price difference between Amsterdam Centrum and Amsterdam Noord across the Ij river. (This will probably be reduced once the North-South metro line is completed).

Determining trends

The data can also be visualized in time, and used as a gauge of market sentiment. Figure 2 illustrates the development of (most recent) asking prices and the time it takes for properties to sell.

Development of the real estate market in Amsterdam
Figure 2. Asking prices before sale (above) and days the property was offered on Funda (below) for over 11,000 properties in the period 1 April 2015 - 18 July 2016. The blue dots represent individual properties, the red curves weekly averages, and the green curves (weighted) exponential fits of the weekly averages.

As seen from Figure 2, over the period observed, house prices have increased by 15% per year on average. Despite that, the average time it takes for a property to sell has more than halved. (It remains to investigate whether these results are biased by how long Funda keeps pages of sold properties online). In short, the data seems to confirm that the Amsterdam housing market is heating up!

funda-scraper's People

Contributors

khpeek avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.