I am integrating a PBF reader into the OSM parsing. This will likely be much faster th

PBF Reader for OSM Parsing about graphhopper HOT 10 CLOSED

graphhopper commented on May 22, 2024

PBF Reader for OSM Parsing

from graphhopper.

Comments (10)

karussell commented on May 22, 2024

Cool!

As a side effect, the PBF reader will be able to use multiple threads.

Why that? Via queue or something else?

from graphhopper.

NopMap commented on May 22, 2024

Yes it comes with two queues and multiple possible worker threads.

But first I have a problem. The PBF classes need two additional JAR files. I would like to add them as JARs not source to make sure they are unmodified due to their licensing.

protobuf-java-2.4.1.jar from Google, BSD license, https://code.google.com/p/protobuf/
osmpbf-1.1.1.jar from S Crosby, LGPL license, https://github.com/scrosby/OSM-binary

How do you add these .jar dependencies to graphhopper maven?
Where would we park the required license files?

from graphhopper.

karussell commented on May 22, 2024

Hmmh, can we establish a plugin mechanism somehow? E.g. I would also like to add apache-compress (see tools project).

But I don't like the fact that we blow up the graphhopper size and dependencies just for one single usecase as import which not all need (e.g. on Android). I want that grapphhopper has only two external dependencies (trove4j+later on probably lucene)

If dependencies are in maven central then you just need an additional dependency section in the pom.xml. If not this gets a bit more complicated, where you need to add a repository or different. E.g. for protobuf this looks like:

<dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
       <version>2.5.0</version>
</dependency>

Probably we can create an import class + jar in the tools project somehow?

from graphhopper.

karussell commented on May 22, 2024

Ok, if we add <scope>provided</scope> (or <optional>true</optional>?) to the dependency we could implement that in the core without requiring all users to have that bundled. I'll think about this.

I've read this and this

scope=provided means that the library is needed for compilation and
runtime, however it is provided by some sort of container. Typical
example: servlet-api
optional=true means that a library is needed for compilation, but it
is not necessary at runtime. Very often this is a symptom of poorly
made modules: it is best to isolate optional code into a different
module where the dependency is not an option. For example, in Velocity
Tools we had an optional dependency on an XML library, for a specific
XML tool. Isolating this code into a new module made this dependency
mandatory, but you have to include one more module in the using
project.

from graphhopper.

NopMap commented on May 22, 2024

This is looking good. I have a working, standalone reader at the moment. Reading bavaria as PBF takes only 17% of the time compared to XML. Yes, no mistake 83% faster. :-)

Now I have to check a few things and then move it into graphhopper. But I need the dependencies for that. Their size is unproblematic, 200k and 450k. Nothing next to trove. :-)

from graphhopper.

karussell commented on May 22, 2024

Reading bavaria as PBF takes only 17% of the time compared to XML

Woot!

Nothing next to trove. :-)

I know, but we'll need a lot more dependencies in the future, so we need to think about that and keep the core small and/or move the import section into another subproject. Not sure yet. But for now probably do that optional thing.

from graphhopper.

NopMap commented on May 22, 2024

I have integrated the PBF reader into graphhopper. The times for bavaria on my machine are now 165s with XML and 44s with PBF. So 1/4 of the time overall. It is working, I checked the routing in the web demo, seems to work fine.

We still have two parameters to tweak for performance. The number of worker threads used for parsing and the maximum length of the queue if parsing is faster than processing in graphhopper. I set it to 2 workers and 50000 queue size because this gave the best performance on my machine with bavaria.

I added the dependencies to the pom.xml. This works, if we want to do it differently we can always change it. Be careful to use these versions and not configure it to the latest version. When I tried a mix of versions it crashed.

This time I did not break any tests, but how would you create a test for the new data format?

from graphhopper.

karussell commented on May 22, 2024

Cool, thanks! We could set the threads to half of the available CPUs of the machine or even make this configurable. I'll do this.

When I tried a mix of versions it crashed.

thanks for this info!

This time I did not break any tests, but how would you create a test for the new data format?

ok, I'll add a new andorra file and we'll see if the results are the same

from graphhopper.

NopMap commented on May 22, 2024

I have 8 cores (with hyperthreading). 2 threads was faster than one, more than that did not improve the speed. On the other hand I already had to put in a limiter to keep the consumption queue from overrunning.

Play with the values, but I don't think half the CPU cores is a good idea.

from graphhopper.

karussell commented on May 22, 2024

Ok, made 2 the default but made it configurable.

This issue will be fixed via #64

from graphhopper.

PBF Reader for OSM Parsing about graphhopper HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent