naasking / appendlog Goto Github PK
View Code? Open in Web Editor NEWAn efficient file stream-based append-only log
License: GNU Lesser General Public License v2.1
An efficient file stream-based append-only log
License: GNU Lesser General Public License v2.1
IAppendLog.Append
is currently an async operation, but should probably be async.
FileLog in particular uses Monitor.Enter
, which should perhaps be a a more async-friendly mutual exclusion mechanism.
It's possible to improve the write parallelism with some simple changes.
Decouple header update from append: after a writer closes its stream, the next writer can begin writing immediately. This will probably require two FileStreams, one for writing the header, one for appending, and a semaphore for each. This might only be a problem when the log is initially started, because of block device semantics: the header and the end of the log are in the same sector, so flushing either header or append stream will overwrite the other. Hint: probably use FileOptions.WriteThrough
for the header stream -- no sense making two calls for the same purpose.
Group commit: have each thread that's waiting to write to the header publish the transaction id it will commit. The writer that acquires the header FileStream checks the publication list and only writes the highest one to the header. In theory, N writers will be merged into 1 flush instead of N sequential flushes. In practice, N>1 writers waiting to commit is probably very unlikely because the append stream will still require a flush to disk before its writer will try to update header, so this sequence of events would seem to be extremely unlikely:
w0 remaining suspended while w1 executes steps v-viii is exceedingly unlikely.
IAppendLog.Append
returns an IDisposable handle which, when disposed, performs all of the flushing needed to persist data to disk.
However, this is a synchronous operation where the rest of the IAppendLog API is largely asynchronous. Switch this to something like the following:
public interface ITransaction : IDisposable
{
Task Commit();
}
public interface IAppendLog
{
...
ITransaction Append(out Stream output, out TransactionId tx);
}
This way, the client must explicitly commit changes to the log and can opt to do so asynchronously. IDisposable.Dispose is then reserved just for resource cleanup as intended, rather than doing double-duty as transaction commit.
FileLog must currently:
A power failure can occur at any step of the above, and only when step 4 is complete is the new transaction actually acknowledged. If we remove or reorder even one flush, bogus data might get through.
For instance, it might seem possible that we could recover from a power failure after step 2, but there's no persisted/reliable indicator that step 2 actually completed. So even if we find a valid entry header, there's no guarantee that all data before the header was persisted because the OS can reorder writes. Only after step 4 actually completes do we have this indicator.
So with all of this in mind, there might be a way to at least reduce seeking and make the log truly forward-only.
Note: this layout may also permit concurrent writing to the same file. Each writer basically just allocates a free chunk via an atomic operation on the log object. This is tricky though, so I won't develop it further at this point.
Add checksum to each log entries to verify log integrity.
Not sure but i think that the way you open the file only writes data to SO file buffers.
So in case of crash you are not granting data is succesfully stored on file.
I'm not really sure. I think that opening the file with file options write throught good make it work
https://ayende.com/blog/163073/file-i-o-flush-or-writethrough
FileLog is intentionally limited to single writers for simplicity and robustness. However, we can build a simple concurrent extension on this foundation by implementing another instance of IAppendLog, which we'll call ConcurrentLog.
The idea is that ConcurrentLog points to a folder which is used to store a set of temporary files, one file per concurrent writer:
I'm not sure whether to enforce that all lower numbered tickets are migrated before migrating later ones. Technically, these events should be independent, but it would be easy for programmers to trigger events during a long write that cause a concurrent writer to complete a short event before the earlier event is finished writing. Enforcing this ordering protects against these scenarios, however:
Given #2 particularly, I'm leaning towards to enforcing sequential replay.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.