Giter VIP home page Giter VIP logo

Comments (1)

purefunkce avatar purefunkce commented on May 25, 2024

Ok, so for anyone else out there thinking their pyspark style lambda expressions are going to work you are going to have a bad day. There is a fundamental issue, namely the C# compiler generates anonymous functions as non-serializable. SparkCLR does some work to serialize fully closed lambda expressions, but any lambda's that reference outside their scope will be picked up by the C# compiler and made non-serializable.

I'm sure there are ways around this to elegantly turn any anonymous function into a method on a serializable class, but it has thus far eluded me. The bottom line is that you need to make a serializable helper class and use a method from that class directly in map, etc. without any lambda expressions that are not fully closed.

From the sharp CLR samples there is a BroadcastHelper class that can help you transfer data as a broadcast variable, and you can use this architecture to send any sort of data to the worker threads by first initializing a new object with the data you want used in the delegate:
[Serializable] internal class BroadcastHelper<T,U> { private readonly Broadcast<T> broadcastVar; internal BroadcastHelper(Broadcast<T> broadcastVar) { this.broadcastVar = broadcastVar; } internal T Execute(U i) { return broadcastVar.Value; } }

now, this can work:
string parallel = "test serialization string"; var broadcast = sc.Broadcast(parallel); Console.WriteLine(test.Map(new BroadcastHelper<string,int>(broadcast).Execute).First());

whereas
string parallel = "test serialization string"; var broadcast = sc.Broadcast(parallel); Console.WriteLine(test.Map(x=>broadcast.Value).First());
== bad day

I would love to chat with anyone out there who has the experience to make these kind of lambda expressions work out of the box without making helper classes.

from mobius.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.