Comments (13)
Which version did you use?
Do you set spark.rss.data.replica.read=2
? It ensures the bitmap metadata of blocks to be written to 2 servers.
As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.
from incubator-uniffle.
Do you set
spark.rss.data.replica.read=2
Yes
As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.
But this step seems execute before readShuffleData
from incubator-uniffle.
Which version did you use
internal version 0.5.0-snapshot
from incubator-uniffle.
Do you set
spark.rss.data.replica.read=2
Yes
As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.
But this step seems execute before
readShuffleData
The metadata is acquired in advance, but data integrity check is executed when all blocks have been fetched.
In current implementation, the client will only fetch “the first available” server to avoid the read cost.
But when the data in this first server is damaged, the final check will report "read inconsistent".
from incubator-uniffle.
I know, but the application will fail
from incubator-uniffle.
Do you set
spark.rss.data.replica.read=2
Yes
As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.
But this step seems execute before
readShuffleData
The metadata is acquired in advance, but data integrity check is executed when all blocks have been fetched. In current implementation, the client will only fetch “the first available” server to avoid the read cost. But when the data in this first server is damaged, the final check will report "read inconsistent".
I feel a little unreasonable about this implement. Should we read next shuffle server when the data isn't complete?
from incubator-uniffle.
I feel a little unreasonable about this implement. Should we read next shuffle server when the data isn't complete?
I am trying to do this, and i think it needs to be fixed with #108 together
from incubator-uniffle.
I would be happy to review this PR, and you should avoid to fetch redundancy blocks from the another server (because the spark has consumed this blocks).
Rss has provided some skipping mechanisms for localfile and hdfs.
But I'am worry about memory data. @jerqi
from incubator-uniffle.
I would be happy to review this PR, and you should avoid to fetch redundancy blocks from the another server (because the spark has consumed this blocks). Rss has provided some skipping mechanisms for localfile and hdfs. But I'am worry about memory data. @jerqi
In my opinion, memory data should also have data skip ability, and our read memory process should be optimized.
from incubator-uniffle.
Get
from incubator-uniffle.
This will change server's memory storage to add "index" like hdfs
from incubator-uniffle.
This will change server's memory storage to add "index" like hdfs
This problem will should discuss in another issue, we also should have a simple design doc.
from incubator-uniffle.
closed by #276
from incubator-uniffle.
Related Issues (20)
- [Flaky Test] Tests may fail on different machines
- [Bug] Incorrect disk size for local storage HOT 1
- [Bug] When a application is expired in one of shuffle servers assigned, all application data on HDFS will be deleted HOT 3
- [Improvement] Introduce the local_storage_is_writable metric HOT 3
- [Improvement] use the disk size obtained from periodic check to determine whether the disk can be written
- [FEATURE] Support pending tasks number metrics for Netty EventLoopGroup
- [FEATURE] Show read_used_buffer_size in DashBoard HOT 1
- [Bug] Asynchronous verification causes invalid resending of data blocks. HOT 3
- [Flaky Test] Tests fail because of VM crash HOT 3
- [Improvement] Upgrade from commons-collections:commons-collections:3.2.2 to org.apache.commons:commons-collections:4.4
- [Improvement] Bump Netty from 4.1.106.Final to 4.1.109.Final
- [Improvement] Bump gRPC from 1.61.1 to 1.63.0
- [Improvement] Upgrade Jetty to the latest stable version
- [Improvement] Upgrade the default NodeJS and npm versions of dashboard.
- [FEATURE] support use skip list to store shuffleBuffer in memory HOT 2
- [FEATURE] Introduce pluggable clientConf access in coordinator when clients fetch client conf
- [FEATURE] Refactor reconfigurable conf framework and apply to shuffleServer module
- [Improvement] Log message should indicate RPC error during after close / shutdown
- [Improvement] pick partitions instead of shuffles for flushing
- [Bug] ClassCastExpection of boolean -> string when getting remote client conf in coordinator
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from incubator-uniffle.