Comments (5)
Hi @vkrause ,
- Do you only one one update a day, with a daily diff from planet.openstreetmap.org ?
- Are there read transactions that may be happening at the same time as the update?
My understanding of LMDB is that the database size will never shrink - this is a consequence of its MVCC design which avoids the performance impact of a compaction phase. If there is reads happening simultaneously as writes, the writes are guaranteed to grow the DB instead of reusing empty pages.
Maybe it will be useful to implement osmx compact file.osmx
which simply creates a new database, does cursor iteration over every old sub-database and inserts into new with MDB_APPEND - effectively offline compaction. This means your peak storage usage when you're done writing new and haven't deleted old + renamed is about 2 terabytes. Would that be a preferred solution over a full re-import? I think it should be much faster than a reimport from PBF, but hard to know without testing.
from osmexpress.
Thanks for the quick response @bdon!
1. Do you only one one update a day, with a daily diff from planet.openstreetmap.org ? 2. Are there read transactions that may be happening at the same time as the update?
Right, we only do one update a day using the daily diffs, and there is no safeguard against simultaneous reads during that time.
My understanding of LMDB is that the database size will never shrink - this is a consequence of its MVCC design which avoids the performance impact of a compaction phase. If there is reads happening simultaneously as writes, the writes are guaranteed to grow the DB instead of reusing empty pages.
Ah, that's an interesting theory! So if we run the updates more frequently that might increase the likelihood of read/write collisions, but the "damage" would be much smaller when they happen, and thus this could overall reduce the growth speed?
Maybe it will be useful to implement
osmx compact file.osmx
which simply creates a new database, does cursor iteration over every old sub-database and inserts into new with MDB_APPEND - effectively offline compaction. This means your peak storage usage when you're done writing new and haven't deleted old + renamed is about 2 terabytes. Would that be a preferred solution over a full re-import? I think it should be much faster than a reimport from PBF, but hard to know without testing.
The system we are running this on only has 1TB of fast SSD storage unfortunately (but otherwise has plenty of resources), so we did the full reimport on slow disks and replaced the osmx file afterwards. A more efficient offline compaction thus wouldn't really reduce the downtime (which is the copying of the final file, not the reimport itself).
We'll experiment with more frequent updates, if that slows down the growths a bit we already should get to just one reimport per year (and thus 1-2 hours of scheduled downtime), that's manageable.
Thank you!
from osmexpress.
What if you implement reader/writer mutual exclusion, by having the reader acquire the same lock like this:
https://github.com/protomaps/OSMExpress/blob/master/utils/osmx-update#L17
If your application can accept reads being blocked for as long as a write happens - which for minutely updates should be a few seconds at most - it's worth experimenting to see if that solves the DB growth issue. Measuring the effect of mutual exclusion or frequent updates would be useful as a contribution to the docs :)
from osmexpress.
What if you implement reader/writer mutual exclusion, by having the reader acquire the same lock like this:
https://github.com/protomaps/OSMExpress/blob/master/utils/osmx-update#L17
That could be a viable option indeed, and looks straightforward to implement.
If your application can accept reads being blocked for as long as a write happens - which for minutely updates should be a few seconds at most - it's worth experimenting to see if that solves the DB growth issue. Measuring the effect of mutual exclusion or frequent updates would be useful as a contribution to the docs :)
I'll do some measurements now that I know what to try. Definitely happy to report/contribute back what we find, all our work is free/open/public anyway :)
Thanks again for your help!
from osmexpress.
Marking this as closed for now since behavior is as expected.
from osmexpress.
Related Issues (20)
- Query for multiple nodes, ways, or relations at once by ID HOT 3
- Segfault running query command with no arguments HOT 2
- augmented diff example program HOT 7
- get the approximate cell covering for a relation HOT 1
- What's the preferred linux package name for OSM Express? HOT 3
- Document how to build with system libs instead of vendored libs HOT 4
- (windows) MDB_Transaction Error when trying to work with Windows Subsystem for Linux HOT 3
- Add polygon extract queries to the Python API HOT 12
- Publish 0.0.4 osmx python bindings HOT 2
- Converting full planet pbf file fail HOT 4
- Include all metadata for locations
- S2CellUnion Expand for extract operation HOT 4
- Script to render examples/screenshot.png HOT 1
- Possible to use this to serve tiles? HOT 3
- Cannot extract node tags HOT 8
- Crashes during parsing of malformed OSMX files HOT 2
- Master build failing on Ubuntu 22.04.1 LTS HOT 1
- Investigate string pools HOT 5
- Transactions will fail when db exceeds 1 TB
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from osmexpress.