Comments (10)
My understanding is that the writes are currently all async. I don't see an fsync or fdatasync call in the source. This means two things:
- boltdb, as it is right now, can lose data on power failure / power failure / kernel crash
- this ticket should be titled "Sync writes" ;)
Also, whether sync or async, the writes need to be careful about ordering; with the current POSIX APIs, that means bolt can't write the new meta page until all the dependencies have actually hit the disk.
(And if you're thinking about the actual async AIO api, just don't -- it's not worth the trouble.)
from bolt.
And now I see O_SYNC. That's probably over-eager. I guess now I see what you mean. Sorry for the noise. This ticket is correct.
So, as far as I can see, what you need is
- commit: write non-meta pages, fdatasync, write meta page, fdatasync or sync_file_range
- size change: fsync
- create: fsync, fsync containing dir
and with those, O_SYNC isn't needed.
As has happened before, I'm surprised by the quality I see in BoltDB. Good job!
from bolt.
lol, thanks for the ticket. Bolt actually implements what LMDB calls METASYNC. Only the meta page is written with the sync file descriptor. The other pages are being written without O_SYNC.
I'm not sure how to do testing for this yet. Or even if I can do testing without unplugging my hard drive's power source.
The Async could result in lost data but it's mainly there if someone wants to implement a WAL (or if they don't really care about lost data during failures).
from bolt.
To my best reading, that mode of lmdb works like this:
- me_fd is opened normally
- me_mfd is opened O_SYNC
- non-meta pages are flushed to disk with just writes
- MDB_FDATASYNC(env->me_fd)
- meta page written with O_SYNC
(The above is a good setup because it lets the kernel write the non-meta pages in arbitrary order.)
But that's not safe without the fdatasync! If you don't have the fdatasync in the above, you can end up with this:
App:
- submit write for non-meta pages X, Y, Z
- submit write for meta page A
- start waiting for meta write to complete
- power loss
Disk:
- write Z
- write A
- power loss
Now you have a committed transaction pointing to garbage.
from bolt.
Not really related to this ticket but now that I brought it up: here's a commit that adds the fdatasync/fsync: tv42/bolt@5ce378b
from bolt.
@tv42 Thanks for the fdatasync()
changes. I merged them in via #76.
LMDB says that MDB_NOSYNC
preserves ACI of ACID if the file system preserves write order:
* <li>#MDB_NOSYNC
* Don't flush system buffers to disk when committing a transaction.
* This optimization means a system crash can corrupt the database or
* lose the last transactions if buffers are not yet flushed to disk.
* The risk is governed by how often the system flushes dirty buffers
* to disk and how often #mdb_env_sync() is called. However, if the
* filesystem preserves write order and the #MDB_WRITEMAP flag is not
* used, transactions exhibit ACI (atomicity, consistency, isolation)
* properties and only lose D (durability). I.e. database integrity
* is maintained, but a system crash may undo the final transactions.
* Note that (#MDB_NOSYNC | #MDB_WRITEMAP) leaves the system with no
* hint for when to write transactions to disk, unless #mdb_env_sync()
* is called. (#MDB_MAPASYNC | #MDB_WRITEMAP) may be preferable.
* This flag may be changed at any time using #mdb_env_set_flags().
But I need to read up on that further to understand it better. I'm not sure how the meta can be in sync but the data pages not be sync'd and you'd only lose the previous transaction.
I'm wondering if an "async" mode is even a good idea for Bolt. It seems like with fdatasync()
that the file system can optimize the write order and it can be left up to the end user to bulk load or coalesce transactions as needed. That way everything in Bolt is ACID.
What do you think?
from bolt.
"However, if the filesystem preserves write order" -- yeah, it's not going to (yes in memory, not on disk), so most of that paragraph is irrelevant. Not sure what the LMDB authors were thinking of. The writes will go in the buffer cache, which will flush them out in somewhat arbitrary order, and the IO scheduler can explicitly reorder them to minimize seeking. If you want ordering, you use fdatasync/sync_file_range etc.
I can only think of two settings where async commits make sense: 1) Redis-style "I don't care about my data" and 2) distributed systems that can set policies like "on disk on 1 node and in memory on 2".
-
Is probably not an ideal fit for Bolt anyway, because natively in-memory systems will probably always be faster, and the single-writer limit of Bolt is probably going to get in the way. Plus, the hybrid between the two worlds is a silly thing to want.
-
Can probably just use an in-memory queue of operations to be done, and a single goroutine flushing them out to Bolt, including batching multiple operations into one Bolt transaction.
I wouldn't burn any effort in worse durability guarantees. To me personally, Bolt is valuable because it's simple, has a good API, and performs well for what it is.
from bolt.
@tv42 That makes sense. Thanks for all the feedback! I'm going to close this one out and keep Bolt simple and ACID compliant. A "no sync" option can be implemented by the end user as a cache or WAL or whatever. :)
from bolt.
Don't mean to reopen the issue, wanted to follow up on @tv42's comments that file systems do not preserve write order. Is this true? ext4, one of the more commonly used fs, explicitly states that it does (default data=ordered).
https://www.kernel.org/doc/Documentation/filesystems/ext4.txt
from bolt.
data=ordered (*) All data are forced directly out to the main file
system prior to its metadata being committed to the
journal.
Nothing in that says write to data block A is done before write to data block B, just that both data block writes are done before the corresponding metadata is written to the journal.
from bolt.
Related Issues (20)
- C/C++ binding for key/value storage HOT 3
- Security at rest HOT 1
- Can I use the same key for nested bucket and filed in a bucket/ HOT 1
- Document for New Contributors?
- page already freed HOT 5
- How to check if a file is a valid boltdb database without panicing HOT 2
- Check is a database open HOT 1
- what kind fo key can enhance write speed HOT 1
- concurrent writes and deadlocks HOT 2
- [RFE] change sequence number HOT 2
- [RFE] Database Generation ID HOT 7
- Database file size not updating after reaching 1GB HOT 3
- Meta2 make DBFile invalid HOT 1
- how to get the value that the current cursor points to HOT 3
- how to use boltdb for multiple files HOT 2
- page already freed on certain builds
- Not able to create subbucket inside loop
- Tons of compilation errors HOT 1
- permission denied in user home directory when open boltdb path HOT 1
- Cursor.Last() returns nil for non-empty bucket
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bolt.