Giter VIP home page Giter VIP logo

Comments (9)

nimbusgo avatar nimbusgo commented on August 31, 2024

this seems like an issue:

    val f = (x:Int) => x*2
    val s = spore{(x:Int) => x*2}

    println(f.isInstanceOf[Serializable]) //true
    println(s.isInstanceOf[Serializable]) //false

from spores.

nimbusgo avatar nimbusgo commented on August 31, 2024

Oh I think I was definitely missing something, I think this is the intended usage? This works with spark now.

    val s = spore { (x:Int) => x * 2 }
    val res  = s.pickle

    def unpack(jSONPickle: JSONPickle)={
      jSONPickle.unpickle[Spore[Int, Int]]
    }

    sc.parallelize(1 to 5).map(x => {unpack(res)(x)}).foreach(println)

This is great and I think I'll be able to use this with spark now.

from spores.

nimbusgo avatar nimbusgo commented on August 31, 2024

hmm, the json that comes out has something like this in it:

"className": "my.package.SomeClass$$anonfun$10$anonspore$macro$24$1"

And it only works if you unpickle this where you have that function on the classpath. You can't ship it somewhere else and unpickle it.

Is there any way to serialize a spore that can be run somewhere else, or is that a topic of future research?

from spores.

jvican avatar jvican commented on August 31, 2024

Yes, the className is the prefix name of the Unpickler and Pickler that you'll use when you receive the spore in the recipient. However, note that this class name depends on the place it's defined. If you want to be able to unpickle this in the recipient, you'll need to share the same program in both master and worker. Therefore, you'll have in both sides a class SomeClass defined in my.package.

from spores.

heathermiller avatar heathermiller commented on August 31, 2024

Hotswapping (dynamically loading new classfiles) is not future work – there's been decades of not-so-successful research involving mobile code. Security is the biggest problem on that front as I'm sure you can understand.

So if you'd like to use spores, you'd have to do what Spark does and ship new jars to workers.

from spores.

jvican avatar jvican commented on August 31, 2024

Could we close this and, maybe, add this to the README project so that people know better this limitation from the beginning?

from spores.

samthebest avatar samthebest commented on August 31, 2024

The conclusion of this thread isn't clear to me. The jar does get shared to the workers from the driver in Spark, so spores can work.

In what situations is this not the case? Not sure what the "limitation" is then.

from spores.

nimbusgo avatar nimbusgo commented on August 31, 2024

So, I definitely understand the reasoning on the tangent. It would require shipping class files / jars and is definitely a hassle. I'm not sure how it would be any more of a security risk than a browser accessible repl (notebooks), which is one of the primary ways people are using spark.

To return to the original issue, is there a good reason why this shouldn't work?

sc.parallelize(1 to 5).map(spore{(x: Int) => x * 2}).collect()

When I found this project and the related slides, I thought this was one of the primary intended uses.

I mean, spark is explicitly mentioned here as a use for spores: https://speakerdeck.com/heathermiller/spores-distributable-functions-in-scala

If this is not the intended way to use spores with spark, I'm curious what the intended way is or if spores are not intended to work with spark.

As I mentioned in an earlier reply, I did find a way to make use of spores, by manually packing and unpacking into json. But this seems like a lot of effort for not much reward. You get protection against closure problems within the immediate scope of the declared function. That is pretty good, and covers many cases. But this doesn't guarantee that your function is actually free of closure problems, because it could call a function which has closure problems and it won't know about it. Worse, you can get inconsistent behavior if this is going on.

Try this out. In theory, sp should be the same function as s But it produces different results.


    val y = 1
    val f = (x: Int) => x * 2 + y
    val sp = spore{
      val fun = f
      (x:Int) => fun(x)
    }

    val s = JSONPickle(sp.pickle.value).unpickle[Spore[Int, Int]]

Also with this method, because I also have to cast the unpack, I'm just trading one set of potential bugs (closure serialization) for another, bad type casts. This makes me think I've got the wrong idea about this and there's got to be a better way.

from spores.

jvican avatar jvican commented on August 31, 2024

Hi @nimbusgo.

No, spores is not a project made exclusively for spark support and, in turn, has more use cases--all of them related to the use of safe, reliable closures in the context of distributed systems (e.g. I'm using it in the function-passing model). While we agree that it should work, we don't provide official support here, since this is more an issue of solving compatibility problems with Spark. Nonetheless, I'd be happy to help you get it working in Spark, I agree that it's an important target.

There's an ongoing PR that improves considerably the serialization/deserialization experience with scala pickling. This is the default mechanism to serialize and deserialize them and I'd encourage you to give it a try. Spark uses its own serialization infrastructure, iirc Kryo, and serializing and deserializing spores need some special information that normal reflection mechanisms lack.

W.r.t. the first code snippet, it should work perfectly but, again, that depends on what Spark is doing under the hood (I'm no expert in Spark, so I can't provide you good feedback on this). If it doesn't work, please let us know why with a small toy project or detailed error information.

The second example that you provide compiles and I'm going to explain why. Spores have some limitations because they use macros. By definition, macros can't inspect the environment on which they are expanded. So, in this case, there's no way that the macro is able to inspect the body of f, because you only have a reference to f. You could remove this limitation by putting the body of f into the spore header or by casting f to a spore and, then, an automatic conversion will take place and fail compilation. Also, if you remove val fun = f from the spore header, it won't compile. If users decide to make use of functions outside the scope and they declare them in the header, we can't do almost anything to prevent it. What we may be able to do, though, is to disallow any use of functions in the Captured type member. I'll entertain this idea for a while and I may even implement it, but I have to deeply look into the problem.

So, in conclusion, two things:

  1. Scala pickling and spores work perfectly together (but wait for some days because we're cutting releases of both soon). I don't know what you mean by packing and unpacking, but you are able to serialize them with s.pickle and deserialize them with s.unpickle[SporeType].
  2. Spores are not the ultimate solution for safe closures and have some limitations. However, in most of the cases, they work. In fact, your example would fall into these limited cases where it's the developers' fault, not ours (remember, we can't do almost anything to prevent it).

Right now, I'm focused on improving the test infrastructure and reducing the number of potential bugs that could appear. If you feel like spores is still a useful project, I'd very much like to see error reports that I could use to improve the overall quality of the project.

from spores.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.