this GitHub issue and submit your CLA at your earliest convenience.
On 07/19/2020, GeoSpark has been accepted to the Apache Software Foundation under the new name Apache Sedona (incubating). The code in this repository will be imported to the ASF Git repository. Old contributors please readStable | Latest | Source code |
---|---|---|
GeoSpark@Twitter || GeoSpark Discussion Board ||
GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.
GeoSpark contains several modules:
Name | API | Spark compatibility | Introduction |
---|---|---|---|
Core | RDD | Spark 2.X/1.X | SpatialRDDs and Query Operators. |
SQL | SQL/DataFrame | SparkSQL 2.1+ | SQL interfaces for GeoSpark core. |
Viz | RDD, SQL/DataFrame | RDD - Spark 2.X/1.X, SQL - Spark 2.1+ | Visualization for Spatial RDD and DataFrame. |
Zeppelin | Apache Zeppelin | Spark 2.1+, Zeppelin 0.8.1+ | GeoSpark plugin for Apache Zeppelin |
GeoSpark supports several programming languages: Scala, Java, SQL, Python and R.
GeoSpark website for detailed documentations
Please visitNews!
- GeoSpark 1.3.1 is released. This version provides a complete Python wrapper to GeoSpark RDD and SQL API. It also contains a number of bug fixes and new functions from 12 contributors. See Python tutorial: RDD, Python tutorial: SQL, Release note
Orignial Contributors
- (Mo)hamed Sarwat (Twitter: @MoSarwat)
- Jia Yu
Impact
GeoSpark Downloads on Maven Central
GeoSpark ecosystem has around 10K downloads per month.