Comments (8)
Hello!
Thank you for the reproduction. I believe there are two issues going on here:
-
The 3p Azure API currently uses
FlushMode.Immediate
to work around issues with large ops hitting the 1MB websocket limit. Since azure 1.0, we've put a lot of effort into fixing those issues, lots more details are documented here. We're currently planning on restoring this to the defaultFlushMode.TurnBased
in the Azure 2.0 release. This means that if one client submits a series of ops synchronously, the Fluid runtime will also process those ops synchronously on all other clients, which helps ensure that application logic doesn't see "partially applied" states due to remote edits. In terms of the reproduction test you've provided, FlushMode.Immediate means that containers might yield between the op which inserts a column and the op which sets that column, despite that not happening on the local client. With FlushMode.TurnBased, that possibility is removed. -
Even with TurnBased flushing enabled, the test reproduces some cases where SharedMatrix is not eventually consistent, albeit much more rarely. I'm investigating these and will update this issue when I have more details.
FYI mostly for awareness: I had to tweak the logic around matrixTestSequenceEndPromise
in the test cases; it was possible for one worker client to apply its edits locally and receive server ops from all other clients, then exit before its in-flight ops made it to the ordering service. This would deadlock the other clients, since they'd be waiting for ops that never made it to the server. The saved
event on the container (and container.isDirty
) allows doing this robustly, and would commonly be used in production scenarios to inform the user if they're trying to exit the application with unsaved data.
from fluidframework.
A node project to reproduce this issue can be found here https://github.com/SampoSyrjanen/shared-matrix-undefined-cell-test
from fluidframework.
@Abe27342, with turn-based flushing enabled, did you repro it using 2.0 bits or 1.0 bits?
I'm not sure if it's the problem, but 1.0 (I think, maybe it was removed earlier) had some implicit batching at the driver layer. The problem with this layer - it was unpredictable, i.e. could flush ops when there were too many ops in the queue.
I do not remember exactly how all these layers worked together and if it could cause such behavior.
I'd rather focus on 2.0, but if we learn that 2.0 works flawlessly, and 1.0 is not, that might be a explanation.
from fluidframework.
It repros on both, I ported the repro to our latest main branch and the behavior is basically the same AFAICT; maybe likelihood of occurrence is a bit different and I didn't notice but the same sort of behavior shows up. Agreed with trying to focus on 2.0. It's pretty clear with latest main bits and FlushMode switched to TurnBased that the remaining issues here are in the DDS realm.
from fluidframework.
@DLehenbauer - any chance you can take a look? I have not look at details, but it feels like we should start with assumption that the bug is on SharedMatrix side. That said, it could be a bug in overall op processing pipeline (or even a service implementation).
If it repros for one service, but not another, that is likely an indication it's a service issue.
from fluidframework.
I'm already investigating the SharedMatrix side. But is it not true that even with SharedMatrix issues entirely fixed, if one has a client with FlushMode.Immediate
doing:
while (true) {
matrix.insertRow(0, 1); // insert row at start
matrix.setCell(0, 1, "value"); // populate that row with a value
await sleep(1000);
}
while another client observes, there's nothing stopping the observing client's op processing from yielding to application logic while in between a row insert and a cell set? This is the crux of my point that we'll need turn-based flushing for the repro provided here to work as expected (i.e. not observe 'partially applied states').
from fluidframework.
Hi, quick update here: #19211 fixes the SharedMatrix-side issue for this bug. This fixes bugs where row values may appear undefined indefinitely (the bug was an eventual consistency issue). It's still true that our azure APIs have FlushMode.Immediate set, and as long as that's true it will be the case that bits of your matrix may be temporarily undefined in the test case. With the example above which looks similar to the test repro:
while (true) {
matrix.insertRow(0, 1); // insert row at start
matrix.setCell(0, 1, "value"); // populate that row with a value
await sleep(1000);
}
a client may yield between the insertRow op and the setCell op, in which case that cell will be undefined until the setCell op is sequenced and processed. @andre4i could you follow-up here once we have more details on our plan for FlushMode and azure client?
from fluidframework.
Related Issues (20)
- ActivityTimeout event handling behavior not consistent in Deli HOT 2
- Deprecate mergeTree's findTile method HOT 1
- 6.1 release blocker: Add removed telemetry items HOT 1
- Remove type parameter from IntervalCollection's add method HOT 1
- Shredded summary upload service fails on compressed binary summary blobs. HOT 3
- Copy of the container with compressed binary summaries is failing. HOT 6
- Memory leak related to unbounded creation of debug loggers HOT 1
- Stop requiring guestDisplayName as pre-condition for requestSocketToken: true as part of joinSession payload HOT 2
- Browser - Database updates on a separate thread HOT 2
- Need to bump axios to 1.6.0+ (and test) to address vulnerability HOT 7
- Misbehaving driver can cause Fluid to hang on container open HOT 7
- Allow SharedTree to be passed across iframe boundary HOT 3
- Blazor SDK HOT 1
- Use @fluidframework/azure-client can not create container and get Error: 0x883 at app.js:125 HOT 3
- Intervals not at expected location sometimes after undo-ing HOT 3
- Issue at container connection : Provided user was not an "AzureUser" HOT 1
- Container issues at connection HOT 10
- Error fetching checkpoint for any document causes rest of the batch to fail in deleteSummarizedOps HOT 2
- FF client Buffer error under browser HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fluidframework.