Giter VIP home page Giter VIP logo

Comments (2)

acanalesg avatar acanalesg commented on July 29, 2024

Basically what I did to solve my problem was create a new class to override call and include a call to a cleanup class:

class OuterJoinReducer(JoinCombiner):
    def __call__(self, key, values):
        self.cleanup()
        if key.isprimary:
            self._key = key.body
            output = self.primary(key.body, values)
            if output:
                for k, v in output:
                    jk = copy(key)
                    jk.body = k
                    yield jk, v
        elif not self.secondary_blocked(key.body):
            for k, v in self.secondary(key.body, values):
                jk = copy(key)
                jk.body = k
                yield jk, v

    def cleanup(self):
        pass

And now I can use it to dump also records in secondary that doesnt have a primary:

class myJoinReducer(OuterJoinReducer):
    def cleanup(self):
        self.primary_data = "UNKNOWN"

    def primary(self, key, values):
        self.primary_data = values.next()

    def secondary(self, key, values):
        for v in values:
            yield key, ((primary_data, ) + v )

In my example I'm not dumping primaries with a secondary, but it would be also quite straight if needed. As you can imagine my primary is like a lookup where I'm translating something from the secondary, I could also cache the primary, but the volume is high and the performance is not good.

It might be good to have this kind of options

from dumbo.

acanalesg avatar acanalesg commented on July 29, 2024

I've found an easier way to do it just by overriding secondary_blocked() returning always False, and updating cache data to null, something like:

class joinReducer(JoinReducer):
    def primary(self, key, values):
        self.primary_data = values.next()

    def secondary(self, key, values):
        for v in values:
            yield self.primary_data, v

    def secondary_blocked(self, b):
        if self._key != b:
            self.primary_data = None
        return False

from dumbo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.