Giter VIP home page Giter VIP logo

geomatch's Introduction

GeoMatch


Distribution of 3.77 billion GPS taxi trajectories.
GeoMatch is a novel, scalable, and efficient big-data pipeline for large-scale map-matching on Apache Spark. GeoMatch improves existing spatial big data solutions by utilizing a novel spatial partitioning scheme inspired by Hilbert space-filling curves. Thanks to its partitioning scheme, GeoMatch can effectively balance operations across different processing units and achieves significant performance gains. GeoMatch also incorporates a dynamically adjustable error correction technique that provides robustness against positioning errors. We evaluate GeoMatch through rigorous and extensive benchmarks that consider datasets containing large-scale urban spatial data sets ranging from 166,253 to 3.78 billion location measurements. Experimental results show up to 27.25-fold performance improvements compared to previous works while achieving better processing accuracy than current solutions as compared to the baseline methods (99.99%).

License

GeoMatch is provided as-is under the BSD 3-clause license (https://github.com/bdilab/GeoMatch/blob/master/LICENSE), Copyright Ⓒ 2019, the City University of New York and University of Helsinki.

Supported Spatial Objects

Currently, GeoMatch supports the following spatial objects.

  • GMPoint
  • GMLineString
  • GMPolygon
  • GMRectangle

New objects can be easily added by extending the class GMGeomBase and overriding the following methods.

  • toJTS: Returns a JTS representation of the spatial objects. This method is used for distance computation by GeoMatch. In some cases a single object maybe transformed into multiple JTS objects for imporved accuracy as shown in GMLineString.scala
  • getHilbertIndexList: The method returns a list of Hilbert Indexes that the object passes through. The implementation is object-dependent. For instance, LineString object utilize a Line Rasterization Algorithms to determine which indexes the segments pass through

Working with GeoMatch

Dependencies

  • Apache Spark 2.4.0
  • Scala 2.11
  • Java 1.8
  • Locationtech JTS 1.16.1

Building GeoMatch

cd Common
mvn compile install
cd ../GeoMatch
mvn compile install

Modifying GeoMatch

GeoMatch works on top of Apache Spark and does not modify its core. The source files can be imported as Maven projects into an IDE like IntelliJ or Eclipse with the Scala plugin installed. After modification, save the files and build the new sources as described earlier in the Build subsection.

Spatial Join kNN

The following is a sample code that shows how GeoMatch can be used to perform map-matching of tow datasets. The example uses the two synthetic datasets in the folder SampleCSV (firstDataset.csv and secondDataset.csv), but the code can be modified to use any other datasets as long as GeoMatch is provided with RDDs that contain the proper objects (i.e. objects that derived from GMGeomBase).

1. Spark context setup

This is a standard SparkContext initiliazation step including configuring Spark to use KryoSerializer. Skipping KryoSerializer may increase the processing time.

val sparkConf = new SparkConf()
  .setAppName("GeoMatch_Test")
  .set("spark.serializer", classOf[KryoSerializer].getName)
  .registerKryoClasses(GeoMatch.getGeoMatchClasses())

// assuming local run
sparkConf.setMaster("local[*]")

val sparkContext = new SparkContext(sparkConf)

2. Creating the Spark RDDs

In this step, we parse the input file and create an RDD of spatial objects. The process is performed in parallel; for performance gains, the largest dataset should be the second dataset. The results of the match operation is ALL objects from the second dataset with a list of matches from the first dataset.

// The first dataset.
val rddFirstSet = sparkContext.textFile("firstDataset.csv")
                        .mapPartitions(_.map(line => {

                            // parse the line and form the spatial object
                            val parts = line.split(',')

                            val arrCoords = parts.slice(1, parts.length)
                                .map(xyStr => {
                                    val xy = xyStr.split(' ')

                                    (xy(0).toDouble.toInt, xy(1).toDouble.toInt)
                                })

                            // create the spatial object GMLineString. The first parameter is the payload
                            // the second parameter is the list of coordinates that form the LineString
                            new GMLineString(parts(0), arrCoords)
                        }))

// The second dataset.
val rddSecondSet = sparkContext.textFile("secondDataset.csv")
                         .mapPartitions(_.map(line => {

                             // parse the line and form the spatial object
                             val parts = line.split(',')

                             // create the spatial object GMPoint. The first parameter is the payload
                             // the second parameter is the point's coordinates
                             new GMPoint(parts(0), (parts(1).toDouble.toInt, parts(2).toDouble.toInt))
                         }))

3. Initializing GeoMatch

In this step, GeoMatch is initialized using parameters shown below:

val geoMatch = new GeoMatch(false, 256, 150, (-1, -1, -1, -1))
Parameter Default Value Description
outputDebugMessages false Set to true to receive output debug messages
hilbertN 256 The size of the Hilbert Space-Filling Curve. If set to a higher number, precision is increased while increasing the processing time.
errorRangeBy 150
The maximum distance for accepted matches. After this distance, the match is rejected. This value is used to find matches within a specific distance and can be used to account for GPS errors.
searchGridMBR (-1, -1, -1, -1)
The search Grid MBR. If the default value is provided, GeoMatch
will compute it from Dataset1. This option can be used to prune the dataset's and exclude out-of-range objects.

4. Map-matching

In this step, we invoke GeoMatch's join operations. Currently, three operations are supported with the last two in the experimental phase.

  • spatialJoinKNN
  • spatialJoinRange
  • spatialJoinDistance
val resultRDD = geoMatch.spatialJoinKNN(rddFirstSet, rddSecondSet, 3, false)

5. Processing Results

Finally, the results can be obtained by invoking one of the standard Spark actions. The following line will simply print the point's payload followed by the payload values of all matched LineString objects.

resultRDD.mapPartitions(_.map(row => println("%-10s%s".format(row._1.payload, row._2.map(_.payload).mkString(",")))))
         .collect()

Copyright Ⓒ 2019, the City University of New York and University of Helsinki. All rights reserved.

geomatch's People

Contributors

azeidan avatar hvo avatar

Stargazers

 avatar xiangyang avatar  avatar Hank Yan avatar  avatar Necip Enes Gengeç avatar HarryZhu avatar  avatar Niklas Christoffer Petersen avatar zhuang avatar Tim Garvin avatar Callum Wang avatar Samu Tamminen avatar Nurgazy Nazhimidinov avatar Andrés Armesto avatar Andrea avatar Andreas Hopfgartner avatar Eemil Lagerspetz avatar

Watchers

 avatar James Cloos avatar  avatar

geomatch's Issues

Use Databricks

Thanks for that great piece! :-)

Can you briefly desribe how to use the package on Databricks cluster? I compliled the sources using Maven and uploaded the jars to the cluster.

Necessary Imports for a working demo

Hi,
Thanks for the amazing package.
I am new to Scala, could you please list the necessary imports required before running the given demo code?
This would be really helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.