Giter VIP home page Giter VIP logo

Comments (6)

hiboyang avatar hiboyang commented on September 3, 2024

Hi @YutingWang98, this is interesting finding! For the size 720 TB/370 TB, is this disk file size, or the stats shown in Spark UI?

from remoteshuffleservice.

YutingWang98 avatar YutingWang98 commented on September 3, 2024

Hi @hiboyang, thank you for the reply! It is the stats from spark UI stage tab ('Shuffle Write Size / Records' value). Also, I just did more tests on a same job and here are my findings:

Remote shuffle service

  • rss without compression: 5.7GB
  • rss with compression (lz4): 3.1GB
  • made changes in rss and allow it to use zstd:
    • compression level=1 2.4 GB
    • compression level=7 2031 MB

External shuffle service

  • default setting (zstd compression, compression level=1): 1927 MB
  • without compression (spark.shuffle.service.enabled=false): 5.6GB
  • other compressions methods
    • spark.io.compression.codec=lz4: 2.7 GB
    • spark.io.compression.codec=lzf 2.7GB
    • spark.io.compression.codec=snappy 2.7 GB

So, I think switching to zstd might be helpful.

from remoteshuffleservice.

hiboyang avatar hiboyang commented on September 3, 2024

Nice to know zstd has better compression ratio. I created a PR to support zstd in RSS: https://github.com/uber/RemoteShuffleService/pull/91/files

from remoteshuffleservice.

YutingWang98 avatar YutingWang98 commented on September 3, 2024

Thank you! WIll test our job with this new change. Also I think in spark compression, the zstd compression level 'spark.io.compression.zstd.level' is set to 1 as default. But I saw you are using level 3 as default. Is there any specific reason for it?

from remoteshuffleservice.

hiboyang avatar hiboyang commented on September 3, 2024

No specific reason :) I changed it to level 1 in the PR, but forgot to reply the comment here.

from remoteshuffleservice.

YutingWang98 avatar YutingWang98 commented on September 3, 2024

I saw your changes in the pull request! Thanks so much.

from remoteshuffleservice.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.