Giter VIP home page Giter VIP logo

Comments (5)

rickfish avatar rickfish commented on August 17, 2024

I was able to dig a little deeper and found that the reason these SCHEDULED task are not popped is because the RedisMock class's zAdd method for these tasks for the HTTP UNACK queue returns 0. That causes the task to not be popped in the _pop method of RedisDynoQueue. Can anyone explain what the unack queue is and how it is used? Is it used to see if the task was ack'ed?

For some tasks, returning 0 from the zAdd method does not cause issues as I guess further calls return a non-zero. But for these tasks a 0 is always returned.

from dyno-queues.

rickfish avatar rickfish commented on August 17, 2024

More info. Seems to be a bug with RedisSortedSetCache. It is trying to maintain two maps, the cache map and the scores map. The scores map value is a Map and the cache map value is a Set. When tasks are added to the UNACK.HTTP cache after they have already been added before, a duplicate entry is added to the cache map's Set but the scores entry is just overwritten (since the put does not allow dups). Then when the entry is removed, only one is removed from the cache, leaving one (or more) and the scores entry is deleted entirely.

The Comparator of the cache map does comparisons using the scores map, thinking that there are equivalent entries in both maps. If it finds two scores that are null it will return 0 (equal).

Since the SortedSet contains() method uses the comparator, even though the set does not contain the task, it thinks it does because of the null-scores comparison.

Bottom line on this is that this causes SCHEDULED tasks to be stranded forever and never popped. This is a big problem.

NOTE THAT this bug only happens when more than one HTTP system task is SCHEDULED in parallel and only sporadically. I think it is a race condition but I am not sure yet as this is very difficult to debug. I am also not sure if it only happens with HTTP tasks.

Here’s my debugging history:

In the Redis persistence implementation of conductor, the DynoQueueDAO uses RedisDynoQueue as its queue impl. RedisDynoQueue uses JedisMock as its JedisCommands impl. JedisMock uses RedisMock for its zadd and zrem impl if the params contain “nx”, which they do.

RedisMock uses the RedisSortedSetCache as its ‘zsetCache’. That class has two maps – cache and scores. The cache has a Comparator that doesn’t just compare the task ids, it also compares the scores in the scores map for the two task ids of the comparison.

It appears as though the cache entry that is causing the problem is the “conductor_queues.test.UNACK.HTTP.c” cache.

When the pop() method of DynoQueueDAO is called, it calls the pop() method of the “conductor_queues.test.UNACK.HTTP.c” RedisDynoQueue which calls its internal _pop() method. In its _pop() method, it does this for each task in the UNACK.HTTP queue:

                 long added = quorumConn.zadd(unackQueueName, unackScore, msgId, zParams);
                if(added == 0){
                       monitor.misses.increment();
                       continue;
                }

So if ‘added=0’, that task will not be added to the tasks to pop. In the zadd method of RedisMock, the contains() method of the cache map is used to see if the task exists in the cache. If it does, it returns 0. The contains() method uses the Comparator specified by RedisSortedSet.set() method which doesn’t just check the task id of the others in the set, it also will return a 0 (equals) if both scores for those tasks are null, which they are at that point. So it returns 0 because both scores are null and it is not popped.

Note that this only happens the first time zadd is called. After the contains() method is called to determine if the task is in the queue, it then puts it in the queue with the zsetCache.set() call so from then on it does actually exist and the contains() method will then always return 0. The task stays in there forever and never gets popped.

I am not sure the best way to fix this yet. I am looking for advice.

from dyno-queues.

ipapapa avatar ipapapa commented on August 17, 2024

If this is a Dyno Queues issue, I suggest you open an issue on https://github.com/netflix/dyno-queues so that the proper team can handle it.

from dyno-queues.

vwyuheng avatar vwyuheng commented on August 17, 2024

hi @rickfish
Is this problem solved?

I also encountered the same problem

conductor version 2.29.0
redis 5.0.5

If you solved it,
I want to know how to solve it, what should I do?

from dyno-queues.

rickfish avatar rickfish commented on August 17, 2024

Not sure if it was solved. Since we are now using Postgres as the db it only affects us if running on somebody's laptop using db=memory. In that case, it would just be a developer doing a test of his/her changes so this kind of problem would not be noticed. In any case, I think it is a DynoQueue issue with RedisSortedSetCache and would have to be resolved in that OpenSource project.

Sorry I could not help more.

from dyno-queues.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.