naasking / appendlog Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 2.0 102 KB

An efficient file stream-based append-only log

License: GNU Lesser General Public License v2.1

C# 100.00%

appendlog's People

Contributors

Stargazers

Watchers

Forkers

dhilip89 ak800i

appendlog's Issues

Async append

IAppendLog.Append is currently an async operation, but should probably be async.

FileLog in particular uses Monitor.Enter, which should perhaps be a a more async-friendly mutual exclusion mechanism.

Write parallelism and group commit

It's possible to improve the write parallelism with some simple changes.

Decouple header update from append: after a writer closes its stream, the next writer can begin writing immediately. This will probably require two FileStreams, one for writing the header, one for appending, and a semaphore for each. This might only be a problem when the log is initially started, because of block device semantics: the header and the end of the log are in the same sector, so flushing either header or append stream will overwrite the other. Hint: probably use FileOptions.WriteThrough for the header stream -- no sense making two calls for the same purpose.
Group commit: have each thread that's waiting to write to the header publish the transaction id it will commit. The writer that acquires the header FileStream checks the publication list and only writes the highest one to the header. In theory, N writers will be merged into 1 flush instead of N sequential flushes. In practice, N>1 writers waiting to commit is probably very unlikely because the append stream will still require a flush to disk before its writer will try to update header, so this sequence of events would seem to be extremely unlikely:
1. writer w0 flushes append stream to disk
2. w0 acquires header stream lock
3. w0 releases append stream lock
4. w0 interrupted
5. writer w1 acquires append stream lock
6. w1 completes writes
7. w1 flushes to disk
8. w1 can't acquire header stream lock, so publishes its higher txid
9. w0 resumes and checks publication list and writes w1's txid

w0 remaining suspended while w1 executes steps v-viii is exceedingly unlikely.

Async commit protocol

IAppendLog.Append returns an IDisposable handle which, when disposed, performs all of the flushing needed to persist data to disk.

However, this is a synchronous operation where the rest of the IAppendLog API is largely asynchronous. Switch this to something like the following:

public interface ITransaction : IDisposable
{
  Task Commit();
}
public interface IAppendLog
{
  ...
  ITransaction Append(out Stream output, out TransactionId tx);
}

This way, the client must explicitly commit changes to the log and can opt to do so asynchronously. IDisposable.Dispose is then reserved just for resource cleanup as intended, rather than doing double-duty as transaction commit.

No-seek, append-only disk layout

FileLog must currently:

write the entry length at the end of the content,
force a flush to disk to ensure the entry is persisted,
seek back to the header and write a pointer to the new entry in the header
force a flush to disk to persist the update.

A power failure can occur at any step of the above, and only when step 4 is complete is the new transaction actually acknowledged. If we remove or reorder even one flush, bogus data might get through.

For instance, it might seem possible that we could recover from a power failure after step 2, but there's no persisted/reliable indicator that step 2 actually completed. So even if we find a valid entry header, there's no guarantee that all data before the header was persisted because the OS can reorder writes. Only after step 4 actually completes do we have this indicator.

So with all of this in mind, there might be a way to at least reduce seeking and make the log truly forward-only.

Chunked log

The log is broken up into fixed-size chunks of a certain (optimal) size for persisting to disk (like a sector).
The log header contains a GUID, and this GUID is replicated in each chunk header to tag it as valid data. It's vanishingly unlikely that a random sector dump on power loss will precisely replicate the log GUID in precisely the right spot.
There are internal chunks, indicating more data to write/read, and terminal chunks indicating a transaction end.
On commit, whatever padding is left in a chunk is zeroed, then Flush(true) is called. Upon completion, the terminal header is written, then Flush(true) is called again. While the total number flushes is the same, no large seeking occurs, just a small seek to rewrite the header at the end of the chunk.
On log load, we seek to the last chunk given the file length, then work backwards until we find a terminal chunk and resume writing from there.

Note: this layout may also permit concurrent writing to the same file. Each writer basically just allocates a free chunk via an atomic operation on the log object. This is tricky though, so I won't develop it further at this point.

Checksum log entries

Add checksum to each log entries to verify log integrity.

FileOptions.WriteThrough

Not sure but i think that the way you open the file only writes data to SO file buffers.
So in case of crash you are not granting data is succesfully stored on file.
I'm not really sure. I think that opening the file with file options write throught good make it work

https://ayende.com/blog/163073/file-i-o-flush-or-writethrough

Supporting concurrent writers

FileLog is intentionally limited to single writers for simplicity and robustness. However, we can build a simple concurrent extension on this foundation by implementing another instance of IAppendLog, which we'll call ConcurrentLog.

Design

The idea is that ConcurrentLog points to a folder which is used to store a set of temporary files, one file per concurrent writer:

On write completion, the written file is renamed to a known extension which is then migrated to an external FileLog by a worker thread.
The ConcurrentLog has a ticket field set at log initialization, which is used to generate unique file names via an atomic increment on each Append operation.
ConcurrentLog can also replay some incomplete transactions, ie. writers that had committed successfully but hadn't yet completed their copy process can simply restart.
No particular migration order is guaranteed, technically the same guarantee provided by Monitor.Enter/Exit, which is used to mutually exclude writers in FileLog. However, I think ordering the writes by ticket number would be most robust.

Open Questions

I'm not sure whether to enforce that all lower numbered tickets are migrated before migrating later ones. Technically, these events should be independent, but it would be easy for programmers to trigger events during a long write that cause a concurrent writer to complete a short event before the earlier event is finished writing. Enforcing this ordering protects against these scenarios, however:

We sacrifice write throughput.
Writes are no longer assumed independent, which means replay can't just skip an incomplete write on replay after power loss. This potentially loses many other independent successful writes.

Given #2 particularly, I'm leaning towards to enforcing sequential replay.

naasking / appendlog Goto Github PK

appendlog's People

Contributors

Stargazers

Watchers

Forkers

appendlog's Issues

Async append

Write parallelism and group commit

Async commit protocol

No-seek, append-only disk layout

Chunked log

Checksum log entries

FileOptions.WriteThrough

Supporting concurrent writers

Design

Open Questions

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent