GeoMatch is provided as-is under the BSD 3-clause license (https://github.com/bdilab/GeoMatch/blob/master/LICENSE), Copyright Ⓒ 2019, the City University of New York and University of Helsinki.
Currently, GeoMatch supports the following spatial objects.
- GMPoint
- GMLineString
- GMPolygon
- GMRectangle
New objects can be easily added by extending the class GMGeomBase and overriding the following methods.
- toJTS: Returns a JTS representation of the spatial objects. This method is used for distance computation by GeoMatch. In some cases a single object maybe transformed into multiple JTS objects for imporved accuracy as shown in GMLineString.scala
- getHilbertIndexList: The method returns a list of Hilbert Indexes that the object passes through. The implementation is object-dependent. For instance, LineString object utilize a Line Rasterization Algorithms to determine which indexes the segments pass through
- Apache Spark 2.4.0
- Scala 2.11
- Java 1.8
- Locationtech JTS 1.16.1
cd Common mvn compile install cd ../GeoMatch mvn compile install
GeoMatch works on top of Apache Spark and does not modify its core. The source files can be imported as Maven projects into an IDE like IntelliJ or Eclipse with the Scala plugin installed. After modification, save the files and build the new sources as described earlier in the Build subsection.
The following is a sample code that shows how GeoMatch can be used to perform map-matching of tow datasets. The example uses the two synthetic datasets in the folder SampleCSV (firstDataset.csv and secondDataset.csv), but the code can be modified to use any other datasets as long as GeoMatch is provided with RDDs that contain the proper objects (i.e. objects that derived from GMGeomBase).
This is a standard SparkContext initiliazation step including configuring Spark to use KryoSerializer. Skipping KryoSerializer may increase the processing time.
val sparkConf = new SparkConf()
.setAppName("GeoMatch_Test")
.set("spark.serializer", classOf[KryoSerializer].getName)
.registerKryoClasses(GeoMatch.getGeoMatchClasses())
// assuming local run
sparkConf.setMaster("local[*]")
val sparkContext = new SparkContext(sparkConf)
In this step, we parse the input file and create an RDD of spatial objects. The process is performed in parallel; for performance gains, the largest dataset should be the second dataset. The results of the match operation is ALL objects from the second dataset with a list of matches from the first dataset.
// The first dataset.
val rddFirstSet = sparkContext.textFile("firstDataset.csv")
.mapPartitions(_.map(line => {
// parse the line and form the spatial object
val parts = line.split(',')
val arrCoords = parts.slice(1, parts.length)
.map(xyStr => {
val xy = xyStr.split(' ')
(xy(0).toDouble.toInt, xy(1).toDouble.toInt)
})
// create the spatial object GMLineString. The first parameter is the payload
// the second parameter is the list of coordinates that form the LineString
new GMLineString(parts(0), arrCoords)
}))
// The second dataset.
val rddSecondSet = sparkContext.textFile("secondDataset.csv")
.mapPartitions(_.map(line => {
// parse the line and form the spatial object
val parts = line.split(',')
// create the spatial object GMPoint. The first parameter is the payload
// the second parameter is the point's coordinates
new GMPoint(parts(0), (parts(1).toDouble.toInt, parts(2).toDouble.toInt))
}))
In this step, GeoMatch is initialized using parameters shown below:
val geoMatch = new GeoMatch(false, 256, 150, (-1, -1, -1, -1))
Parameter | Default Value | Description |
outputDebugMessages | false | Set to true to receive output debug messages |
hilbertN | 256 | The size of the Hilbert Space-Filling Curve. If set to a higher number, precision is increased while increasing the processing time. |
errorRangeBy | 150 |
The maximum distance for accepted matches. After this distance, the match is rejected. This value is used to find matches within a specific distance and can be used to account for GPS errors.
|
searchGridMBR | (-1, -1, -1, -1) |
The search Grid MBR. If the default value is provided, GeoMatch
will compute it from Dataset1. This option can be used to prune the dataset's and exclude out-of-range objects.
|
In this step, we invoke GeoMatch's join operations. Currently, three operations are supported with the last two in the experimental phase.
- spatialJoinKNN
- spatialJoinRange
- spatialJoinDistance
val resultRDD = geoMatch.spatialJoinKNN(rddFirstSet, rddSecondSet, 3, false)
Finally, the results can be obtained by invoking one of the standard Spark actions. The following line will simply print the point's payload followed by the payload values of all matched LineString objects.
resultRDD.mapPartitions(_.map(row => println("%-10s%s".format(row._1.payload, row._2.map(_.payload).mkString(",")))))
.collect()
Copyright Ⓒ 2019, the City University of New York and University of Helsinki. All rights reserved.