Giter VIP home page Giter VIP logo

geobenchmarks's Introduction

geobenchmarks

Short snippets of python testing various geo-enrichment examples and timing the results.

The initial benchmark is for two flavors of geo-enrichment done using "point in polygon" spatial joins. In the first case, we enrich points by adding column attributes from the polygons they fall within. For example here, the census block groups of the points. In the second case, we enrich the polygons with summary metrics based on the points. Most simply for example, how many Uber rides ended in a particular census block over the full period of record.

Supporting ETL scripts are provided for open data files which don't directly load in OmniSci. To start, we have 4.5m NYC Uber dropoff points from Kaggle. These don't load directly because they are zipped into a file archive with multiple CSV with different schema, only some of which represent the primary data. So the ETL script separates the wheat from the chaff and loads multiple months of data into a single table.

We also use US Census block groups as polygons. The pre-2020 versions of these are available in a single file which loads directly. Since these are what has been used elsewhere, they form the current best basis for benchmarking.

The census has released new geometries for 2020 by state. We will eventually add a script demonstrating how to loop through and load those into a single table. For now probably enough to say that "COPY * FROM 'foo' WITH (geo='true')" appends by default and accepts http(s) and s3 paths directly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.