Comments (5)
The bottleneck is on line 23 of geom.py, which is only executed for a duplicate node:
Geometry.geometries.remove(self)
When there are few or no duplicates the performance drawback isn't really noticeable, even if you have millions of nodes. But in this case you have around 1 million duplicates which should all be searched for in a non-hashed list before removal.
On my computer it takes on average 0.035 seconds to process each unique coordinate (regardless of existing duplicates or not). That may not seem like much, but with 2,5 million unique coordinates the process ends up taking > 24 hours.
Which brings us to a more important question: why add elements to a list if you want to remove them later, without ever having used them? I am convinced that mergPoints can be integrated in parseData, that should significantly improve the performance.
from ogr2osm.
The changes work in roelderickx/ogr2pbf, processing time is down to around 5 minutes. I'll try to backport the changes to ogr2osm and create a pull request.
However, I see you are using a fork now where mergePoints is disabled, which seems to work for what you want to do. In that case you are probably affected by issue #51 as well.
from ogr2osm.
Forgot to mention, the output goes as far as :
l.debug("Checking list")
So it must be happening after this message, which of course you can deduct from the backtrace above, so perhaps this was not needed
from ogr2osm.
I've been debugging this a bit further, python is not my forte
though I've added some debug statements. Turns out, we have in mergePoints :
Total points user : 3 508 945 (count of points variable)
Total points coord: 2 527 003 (count of pointcoords variable)
It takes a very long time to process the first 5000 points, unusually long imho:
for (location, pointsatloc) in pointcoords.items():
There are also quite some duplicates present in this dataset so it has to work hard. It doesn't make a lot of sense that is is so slow. I'll hack on this a bit more to find out where the performance hog is.
When we parse the road database, it contains a lot more points :
Merging points
Total points user : 8082689
Making list
Total points coord 8082689
But it seems we don't have any duplicates, so it goes really fast according to te debug logging. But the memory footprint is exactly the same as when we parse the addresses data.
from ogr2osm.
Hey, thanks a lot for looking into to this and bringing #51 to my attention. It's been a while that I hacked on this although the tool still exists and is in use. Really cool you took the time for this.
ogr2pbf is one of the tools in the chain to prepare data for human assisted import into osm via josm.
https://staging.grbosm.site/#/ (zoom low enough and on north part of Belgium for the layer to get pulled from postgres)
Afaik, I solved it by just living with the duplicates and later on in the chain of preprocessing the data it got solved , but I don't exactly remember how.
Anyway, pretty soon I'll be doing a fresh dataprocessing run which is entirely automated in fact, I will give it ago once it's backported and replace my fork , so it gets tested. The whole preprocessing of the data takes about 6 hrs on a decent google cloud node.
Big thanks Roel.
from ogr2osm.
Related Issues (20)
- Conversion HERE Shapefile > ogr2osm > bz2 > Navit .bin using Maptool HOT 15
- Esri fileGDB to OSM direction error. HOT 1
- python killed HOT 4
- Converting DWG files to OSM HOT 3
- unhandled geometry, type: 3003 HOT 1
- Inconsistent or missing use of addparent. HOT 1
- add bounding box information
- Clarify the difference between rounding digits and significant digits HOT 3
- Making main flow importable
- Not returning osm file in Mac HOT 1
- Multipolygon without inner do not assign tags HOT 1
- exception while writing idfile HOT 1
- Fails to output nodes in close proximity to each other HOT 2
- Fails with message free(): invalid pointer HOT 1
- ImportError: No module named osgeo HOT 2
- ogr2osm misses some multipolygons in US National Park boundary shapefile HOT 1
- Transfer node id's from shapefile to .osm file HOT 9
- New Maintainer HOT 10
- Can we convert shp zip to OSM?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ogr2osm.