Giter VIP home page Giter VIP logo

Comments (10)

hiboyang avatar hiboyang commented on September 3, 2024

Yeah, agree it is confusing here. Spark 3.1 and 3.2 have slight difference in shuffle APIs, thus we need to change Remote Shuffle Service accordingly. I used to work on Remote Shuffle Service when I was in Uber. Now I left Uber, and do not have write access to this repo anymore.

What environment are you interested to run Remote Shuffle Service, e.g. YARN, or Kubernetes? If Kubernetes, I have some other repo to make Remote Shuffle Service compatible with Kubernetes for Spark 3.1 and 3.2.

from remoteshuffleservice.

cpd85 avatar cpd85 commented on September 3, 2024

@hiboyang thanks for the response -- I really appreciate it! I think for now, would love to be able to run on YARN. Kubernetes I would love to explore as well. If you point me towards some repo/changes you made for compatibility, maybe I could extend it to run on YARN as well?

from remoteshuffleservice.

hiboyang avatar hiboyang commented on September 3, 2024

I see. In that case, you could change <spark.version>2.4.3</spark.version> in pom.xml to Spark 3 version. You will get some compile error, and you could start from there.

I tried to get some time to provide example, but really busy these days :(

from remoteshuffleservice.

roligupt avatar roligupt commented on September 3, 2024

@hiboyang I am looking to deploy remote shuffle service in my kubernetes cluster, preferably for spark 3.1.1. What's your recommendation?

from remoteshuffleservice.

avs-alatau avatar avs-alatau commented on September 3, 2024

Hi!

Support for spark 3.2 is very interesting
is also required there java 11
I tried to change some parameters for spark 3.2, for example,

<java.version>11</java.version>
<hadoop.version>3.2.2</hadoop.version>
<spark.version>3.2.0</spark.version>
<scala.version>2.12.15</scala.version>

but I get an error

[ERROR] /home/alatau/ssk/3.2/src/main/scala/org/apache/spark/shuffle/rss/RssStressTool.scala:144: not enough arguments for method registerShuffle: (shuffleId: Int, numMaps: Int, numReduces: Int)Unit.
Unspecified value parameter numReduces.
[ERROR]     mapOutputTrackerMaster.registerShuffle(appShuffleId.getShuffleId, numMaps)
[ERROR]                                           ^
[ERROR] one error found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE

from remoteshuffleservice.

cpd85 avatar cpd85 commented on September 3, 2024

@avs-alatau as @hiboyang mentioned, there's a difference in APIs, so its not enough to just change the spark.version -- you'll need to implement the new APIs as well. Bo's done the work here but its only running on k8s at the moment : https://github.com/hiboyang/RemoteShuffleService/tree/k8s-spark-3.2

from remoteshuffleservice.

avs-alatau avatar avs-alatau commented on September 3, 2024

@cpd85 thanks for the link to k8s but at the moment it is possible to configure only for yarn

from remoteshuffleservice.

cpd85 avatar cpd85 commented on September 3, 2024

@avs-alatau could you help me understand what you're asking for? The code doesn't exist or isn't open source for yarn. At the moment I'm working on fighting through these compilation issues to see if I can get a 3.2 client to communicate with a 2.4 server. I'll be happy to share the code if I end up getting it working

from remoteshuffleservice.

avs-alatau avatar avs-alatau commented on September 3, 2024

@cpd85
Thanks for the help. I have a hadoop cluster with spark 3.2
Now spark jobs are working through YARN and there are some problems with this because of which I am looking for an external Shuffle Service
I managed to set up spark jobs on a test cluster for the spark 3.0 version, but due to the fact that spark 3.2 is installed in the industrial cluster, I am looking for an external Shuffle Service that will provide this opportunity
If you manage to build an RSS version for spark 3.2, I will be grateful

from remoteshuffleservice.

cpd85 avatar cpd85 commented on September 3, 2024

@avs-alatau haven't done too much testing but I got this to work with a spark3.2 page rank example app

https://github.com/cpd85/RemoteShuffleService/tree/spark32

from remoteshuffleservice.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.