Giter VIP home page Giter VIP logo

h3-pyspark's Introduction

Hi, I'm Kevin 👋

I'm a computer scientist and software engineer by education, graduating with a B.S. in Computer Science from Cornell University's College of Engineering. I've spent the last year building products on founding teams including Arlo (we're hiring!) and CogBase.

Before that, I spent 5+ years working at Palantir helping the world's most important institutions solve their hardest data problems. I've been fortunate enough to work in a hugely diverse range of industries, including Oil & Gas (BP), Automotive Manufacturing (Stellantis – Formerly FCA), Financial Services (Fiserv – Formerly First Data), National Defense (U.S. DOD), Telecommunications, and Aviation. I've worked in Business Development as both a technical and strategic individual contributor, led an enterprise contract, converted new deals, and managed large teams internally and externally with our customers.

I'm in a uniquely leveraged position to contribute in my next venture, thrive in the unknown, tackle impossible problems, and bring cross-functional knowledge including:

  1. A deep understanding and empathy of end users' problems.
  2. The ability to communicate, scope, and lead teams toward clear requirements.
  3. The technical skills to bootstrap and execute on solutions extremely quickly.

I'm looking for somewhere I can have outsized impact in product, engineering, and business strategy at an early stage company employing technology to revolutionize society's approach to historically messy problems.

Outside of work, I build open-source software, most notably Mintable, a free tool which to manage & automate personal finance analysis with no ads or data collection. I'm an avid cyclist / runner / skier, road tripper, and podcast fan; I dabble with guitar & producing electronic music. 2023 goals include a century (100-mi) bike ride and a half-marathon. Talk to me about photographypersonal finance, your morning Chemex routine, cars (& Top Gear), or your favorite cold open on The Office.

h3-pyspark's People

Contributors

deankieserman avatar kevinschaich avatar rwaldman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

h3-pyspark's Issues

polyfill fails with valid multipolygon geojson

h3_pyspark.polyfill fails when a valid multipolygon geojson is provided
this is expected behavior when utilizing the h3 native library.

however, i thought it would be helpful if this library is able to accept multipolygons.
could I get permission to push a PR?

implementation in src/h3_pyspark/__init__.py

@F.udf(returnType=T.ArrayType(T.StringType()))
@handle_nulls
def polyfill(polygons, res, geo_json_conformant):
    # NOTE: this behavior differs from default
    # h3-pyspark expect `polygons` argument to be a valid GeoJSON string
    polygons = json.loads(polygons)
    type_ = polygons["type"].lower()
    if type_ == "multipolygon":
        output = []
        for i in polygons["coordinates"]:
            _polygon = {"type": "Polygon", "coordinates": i}
            output.extend(list(h3.polyfill(_polygon, res, geo_json_conformant)))
        return sanitize_types(output)
    return sanitize_types(h3.polyfill(polygons, res, geo_json_conformant))

test in tests/test_core.py

multipolygon = '{"type": "MultiPolygon","coordinates": [[[[108.98309290409088,13.240363245242063],[108.98343622684479,13.240363245242063],[108.98343622684479,13.240634779729014],[108.98309290409088,13.240634779729014],[108.98309290409088,13.240363245242063]]],[[[108.98349523544312,13.240002939397714],[108.98389220237732,13.240002939397714],[108.98389220237732,13.240269252464502],[108.98349523544312,13.240269252464502],[108.98349523544312,13.240002939397714]]]]}'

def test_polyfill_multipolygon(self):
        h3_test_args, h3_pyspark_test_args = get_test_args(h3.polyfill)
        print(h3_pyspark_test_args)
        integer = 12
        data = {
            "res": integer,
            "geo_json_conformant": True,
            "geojson": multipolygon,
        }
        df = spark.createDataFrame([data])
        actual = df.withColumn("actual", h3_pyspark.polyfill(*h3_pyspark_test_args))
        actual = actual.collect()[0]["actual"]
        print(actual)
        expected = []
        for i in json.loads(multipolygon)["coordinates"]:
            _polygon = {"type": "Polygon", "coordinates": i}
            expected.extend(list(h3.polyfill(_polygon, integer, True)))
        expected = sanitize_types(expected)
        assert sort(actual) == sort(expected)

'TypeError: must be real number, not NoneType' when using h3_pyspark

Hi, I have the following spark dataframe and the column of h3 indices is created by applying the lat, lng pairs and the resolution to h3_pypark.geo_to_h3(lat, lng, resolution) function. However I encountered the following error when I tried to check if there's any null in the index column. And it's not only isNull() not working but also any other subsetting operations which all throw me the same error, could anyone provide some insights on what might be the issue and how to fix it? Thanks in advance!

dataframe:
image

errors:
image

Bug in index_shape function which misses several hexes

Reported by @rwaldman – we can miss several hexes in the worst case if a line's start and endpoints are east-to-west and towards the north or south edge:

image

Proposed solution is for long line segments (≥ s where s = hex side length) to interpolate several points along the line based on the selected resolution, so that we catch the ones in between:

image

Better error handling when null values are passed in

Currently the behavior for all UDFs is that if any row in your dataframe has a null value, the entire build will fail.

This type behavior would be better/more resilient:

@F.udf(T.ArrayType(T.StringType()))
def index_shape(geometry, resolution):
    if geometry is None:
        return None
    return _index_shape(geometry, resolution)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.