Giter VIP home page Giter VIP logo

Comments (8)

Abe27342 avatar Abe27342 commented on July 22, 2024 1

Hello!

Thank you for the reproduction. I believe there are two issues going on here:

  1. The 3p Azure API currently uses FlushMode.Immediate to work around issues with large ops hitting the 1MB websocket limit. Since azure 1.0, we've put a lot of effort into fixing those issues, lots more details are documented here. We're currently planning on restoring this to the default FlushMode.TurnBased in the Azure 2.0 release. This means that if one client submits a series of ops synchronously, the Fluid runtime will also process those ops synchronously on all other clients, which helps ensure that application logic doesn't see "partially applied" states due to remote edits. In terms of the reproduction test you've provided, FlushMode.Immediate means that containers might yield between the op which inserts a column and the op which sets that column, despite that not happening on the local client. With FlushMode.TurnBased, that possibility is removed.

  2. Even with TurnBased flushing enabled, the test reproduces some cases where SharedMatrix is not eventually consistent, albeit much more rarely. I'm investigating these and will update this issue when I have more details.

FYI mostly for awareness: I had to tweak the logic around matrixTestSequenceEndPromise in the test cases; it was possible for one worker client to apply its edits locally and receive server ops from all other clients, then exit before its in-flight ops made it to the ordering service. This would deadlock the other clients, since they'd be waiting for ops that never made it to the server. The saved event on the container (and container.isDirty) allows doing this robustly, and would commonly be used in production scenarios to inform the user if they're trying to exit the application with unsaved data.

from fluidframework.

SampoSyrjanen avatar SampoSyrjanen commented on July 22, 2024

A node project to reproduce this issue can be found here https://github.com/SampoSyrjanen/shared-matrix-undefined-cell-test

from fluidframework.

vladsud avatar vladsud commented on July 22, 2024

@Abe27342, with turn-based flushing enabled, did you repro it using 2.0 bits or 1.0 bits?
I'm not sure if it's the problem, but 1.0 (I think, maybe it was removed earlier) had some implicit batching at the driver layer. The problem with this layer - it was unpredictable, i.e. could flush ops when there were too many ops in the queue.
I do not remember exactly how all these layers worked together and if it could cause such behavior.
I'd rather focus on 2.0, but if we learn that 2.0 works flawlessly, and 1.0 is not, that might be a explanation.

from fluidframework.

Abe27342 avatar Abe27342 commented on July 22, 2024

It repros on both, I ported the repro to our latest main branch and the behavior is basically the same AFAICT; maybe likelihood of occurrence is a bit different and I didn't notice but the same sort of behavior shows up. Agreed with trying to focus on 2.0. It's pretty clear with latest main bits and FlushMode switched to TurnBased that the remaining issues here are in the DDS realm.

from fluidframework.

vladsud avatar vladsud commented on July 22, 2024

@DLehenbauer - any chance you can take a look? I have not look at details, but it feels like we should start with assumption that the bug is on SharedMatrix side. That said, it could be a bug in overall op processing pipeline (or even a service implementation).
If it repros for one service, but not another, that is likely an indication it's a service issue.

from fluidframework.

Abe27342 avatar Abe27342 commented on July 22, 2024

I'm already investigating the SharedMatrix side. But is it not true that even with SharedMatrix issues entirely fixed, if one has a client with FlushMode.Immediate doing:

while (true) {
    matrix.insertRow(0, 1); // insert row at start
    matrix.setCell(0, 1, "value"); // populate that row with a value
    await sleep(1000);
}

while another client observes, there's nothing stopping the observing client's op processing from yielding to application logic while in between a row insert and a cell set? This is the crux of my point that we'll need turn-based flushing for the repro provided here to work as expected (i.e. not observe 'partially applied states').

from fluidframework.

Abe27342 avatar Abe27342 commented on July 22, 2024

Hi, quick update here: #19211 fixes the SharedMatrix-side issue for this bug. This fixes bugs where row values may appear undefined indefinitely (the bug was an eventual consistency issue). It's still true that our azure APIs have FlushMode.Immediate set, and as long as that's true it will be the case that bits of your matrix may be temporarily undefined in the test case. With the example above which looks similar to the test repro:

while (true) {
    matrix.insertRow(0, 1); // insert row at start
    matrix.setCell(0, 1, "value"); // populate that row with a value
    await sleep(1000);
}

a client may yield between the insertRow op and the setCell op, in which case that cell will be undefined until the setCell op is sequenced and processed. @andre4i could you follow-up here once we have more details on our plan for FlushMode and azure client?

from fluidframework.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.