We have been running osmx with nightly updates successfully for about the last ten mon

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks for the quick response <a class="user-mention notranslate" data-hovercard-type=

Continuous growths of the osmx file with nightly updates about osmexpress HOT 5 CLOSED

protomaps commented on June 16, 2024

Continuous growths of the osmx file with nightly updates

from osmexpress.

Comments (5)

bdon commented on June 16, 2024 1

Hi @vkrause ,

Do you only one one update a day, with a daily diff from planet.openstreetmap.org ?
Are there read transactions that may be happening at the same time as the update?

My understanding of LMDB is that the database size will never shrink - this is a consequence of its MVCC design which avoids the performance impact of a compaction phase. If there is reads happening simultaneously as writes, the writes are guaranteed to grow the DB instead of reusing empty pages.

Maybe it will be useful to implement osmx compact file.osmx which simply creates a new database, does cursor iteration over every old sub-database and inserts into new with MDB_APPEND - effectively offline compaction. This means your peak storage usage when you're done writing new and haven't deleted old + renamed is about 2 terabytes. Would that be a preferred solution over a full re-import? I think it should be much faster than a reimport from PBF, but hard to know without testing.

from osmexpress.

vkrause commented on June 16, 2024

Thanks for the quick response @bdon!

1. Do you only one one update a day, with a daily diff from planet.openstreetmap.org ?

2. Are there read transactions that may be happening at the same time as the update?

Right, we only do one update a day using the daily diffs, and there is no safeguard against simultaneous reads during that time.

My understanding of LMDB is that the database size will never shrink - this is a consequence of its MVCC design which avoids the performance impact of a compaction phase. If there is reads happening simultaneously as writes, the writes are guaranteed to grow the DB instead of reusing empty pages.

Ah, that's an interesting theory! So if we run the updates more frequently that might increase the likelihood of read/write collisions, but the "damage" would be much smaller when they happen, and thus this could overall reduce the growth speed?

Maybe it will be useful to implement osmx compact file.osmx which simply creates a new database, does cursor iteration over every old sub-database and inserts into new with MDB_APPEND - effectively offline compaction. This means your peak storage usage when you're done writing new and haven't deleted old + renamed is about 2 terabytes. Would that be a preferred solution over a full re-import? I think it should be much faster than a reimport from PBF, but hard to know without testing.

The system we are running this on only has 1TB of fast SSD storage unfortunately (but otherwise has plenty of resources), so we did the full reimport on slow disks and replaced the osmx file afterwards. A more efficient offline compaction thus wouldn't really reduce the downtime (which is the copying of the final file, not the reimport itself).

We'll experiment with more frequent updates, if that slows down the growths a bit we already should get to just one reimport per year (and thus 1-2 hours of scheduled downtime), that's manageable.

Thank you!

from osmexpress.

bdon commented on June 16, 2024

What if you implement reader/writer mutual exclusion, by having the reader acquire the same lock like this:

https://github.com/protomaps/OSMExpress/blob/master/utils/osmx-update#L17

If your application can accept reads being blocked for as long as a write happens - which for minutely updates should be a few seconds at most - it's worth experimenting to see if that solves the DB growth issue. Measuring the effect of mutual exclusion or frequent updates would be useful as a contribution to the docs :)

from osmexpress.

vkrause commented on June 16, 2024

What if you implement reader/writer mutual exclusion, by having the reader acquire the same lock like this:

https://github.com/protomaps/OSMExpress/blob/master/utils/osmx-update#L17

That could be a viable option indeed, and looks straightforward to implement.

If your application can accept reads being blocked for as long as a write happens - which for minutely updates should be a few seconds at most - it's worth experimenting to see if that solves the DB growth issue. Measuring the effect of mutual exclusion or frequent updates would be useful as a contribution to the docs :)

I'll do some measurements now that I know what to try. Definitely happy to report/contribute back what we find, all our work is free/open/public anyway :)

Thanks again for your help!

from osmexpress.

bdon commented on June 16, 2024

Marking this as closed for now since behavior is as expected.

from osmexpress.

Continuous growths of the osmx file with nightly updates about osmexpress HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent