Giter VIP home page Giter VIP logo

Comments (13)

frankliee avatar frankliee commented on June 9, 2024

Which version did you use?

Do you set spark.rss.data.replica.read=2 ? It ensures the bitmap metadata of blocks to be written to 2 servers.

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

from incubator-uniffle.

xianjingfeng avatar xianjingfeng commented on June 9, 2024

Do you set spark.rss.data.replica.read=2

Yes

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

But this step seems execute before readShuffleData

from incubator-uniffle.

xianjingfeng avatar xianjingfeng commented on June 9, 2024

Which version did you use

internal version 0.5.0-snapshot

from incubator-uniffle.

frankliee avatar frankliee commented on June 9, 2024

Do you set spark.rss.data.replica.read=2

Yes

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

But this step seems execute before readShuffleData

The metadata is acquired in advance, but data integrity check is executed when all blocks have been fetched.
In current implementation, the client will only fetch “the first available” server to avoid the read cost.
But when the data in this first server is damaged, the final check will report "read inconsistent".

from incubator-uniffle.

xianjingfeng avatar xianjingfeng commented on June 9, 2024

I know, but the application will fail

from incubator-uniffle.

jerqi avatar jerqi commented on June 9, 2024

Do you set spark.rss.data.replica.read=2

Yes

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

But this step seems execute before readShuffleData

The metadata is acquired in advance, but data integrity check is executed when all blocks have been fetched. In current implementation, the client will only fetch “the first available” server to avoid the read cost. But when the data in this first server is damaged, the final check will report "read inconsistent".

I feel a little unreasonable about this implement. Should we read next shuffle server when the data isn't complete?

from incubator-uniffle.

xianjingfeng avatar xianjingfeng commented on June 9, 2024

I feel a little unreasonable about this implement. Should we read next shuffle server when the data isn't complete?

I am trying to do this, and i think it needs to be fixed with #108 together

from incubator-uniffle.

frankliee avatar frankliee commented on June 9, 2024

I would be happy to review this PR, and you should avoid to fetch redundancy blocks from the another server (because the spark has consumed this blocks).
Rss has provided some skipping mechanisms for localfile and hdfs.
But I'am worry about memory data. @jerqi

from incubator-uniffle.

jerqi avatar jerqi commented on June 9, 2024

I would be happy to review this PR, and you should avoid to fetch redundancy blocks from the another server (because the spark has consumed this blocks). Rss has provided some skipping mechanisms for localfile and hdfs. But I'am worry about memory data. @jerqi

In my opinion, memory data should also have data skip ability, and our read memory process should be optimized.

from incubator-uniffle.

xianjingfeng avatar xianjingfeng commented on June 9, 2024

Get

from incubator-uniffle.

frankliee avatar frankliee commented on June 9, 2024

This will change server's memory storage to add "index" like hdfs

from incubator-uniffle.

jerqi avatar jerqi commented on June 9, 2024

This will change server's memory storage to add "index" like hdfs

This problem will should discuss in another issue, we also should have a simple design doc.

from incubator-uniffle.

jerqi avatar jerqi commented on June 9, 2024

closed by #276

from incubator-uniffle.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.