This follows parse-community/Parse-SDK-Android#646 where we found a possible bug in the executor, and would also propose some changes.
Bug?
It is told in comments that
Creates a proper Cached Thread Pool. Tasks will reuse cached threads if available or create new threads until the core pool is full. tasks will then be queued. If an task cannot be queued, a new thread will be created unless this would exceed max pool size, then the task will be rejected. Threads will time out after 1 second.
This is strictly true but practically not with the blocking queue that is being used (LinkedBlockingQueue
with infinite capacity). If I got this correctly, with that queue tasks can always be queued, so the max number of threads you’ll have is the core pool. maxPoolSize has no effect. So this
ThreadPoolExecutor executor = new ThreadPoolExecutor(
CORE_POOL_SIZE,
MAX_POOL_SIZE,
KEEP_ALIVE_TIME, TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>());
should become
ThreadPoolExecutor executor = new ThreadPoolExecutor(
CORE_POOL_SIZE,
MAX_POOL_SIZE,
KEEP_ALIVE_TIME, TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>(SOME_NUMBER));
also official docs from ThreadPoolExecutor
about queues:
Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.) This may be appropriate when each task is completely independent of others, so tasks cannot affect each others execution.
Queue strategy
Using the terminology from oracle site, the current strategy (despite the comments) is the unbounded queue, which fits the case of tasks completely independent of others. You will agree this is not bolts
case. Can we move to the first strategy?
I can read in comments that the bolts executor was designed trying to emulate the AsyncTask executor. But AsyncTasks are independent by design, can afford a big queue.
Bolts Task
are dependent, chained, nested, and the docs suggest a different strategy for this design. We are experiencing lockups in the Parse SDK that you can read in the linked issue (there’s a huge comment explaining the internals). We can block the Task.BACKGROUND_EXECUTOR
forever very easily, in situations like the following, when running concurrently the same action:
BACKGROUND_EXECUTOR pool (max 7 threads for 6 processors)
|- background-executor-thread-1 (needs another background thread to complete)
|- background-executor-thread-2 (needs another background thread to complete)
|- background-executor-thread-3 (needs another background thread to complete)
|- background-executor-thread-4 (needs another background thread to complete)
|- background-executor-thread-5 (needs another background thread to complete)
|- background-executor-thread-6 (needs another background thread to complete)
|- background-executor-thread-7 (needs another background thread to complete)
With 2 processors, this takes 3 concurrent actions to have the executor hang. I don’t think this is fixable from outside bolts, because
- Tasks promote chaining, nesting and dependencies and that’s how you build upon them
- It’s impossible to get rid of the
BACKGROUND_EXECUTOR
and use another, because the background executor is the fallback executor of ImmediateExecutor
. You can mention a custom executor in every task call, but that’s bad performance wise because you lose the simplicity of ImmediateExecutor
and don’t reuse threads.
I propose to move the queuing strategy towards the direct handoffs strategy:
Direct handoffs. A good default choice for a work queue is a SynchronousQueue that hands off tasks to threads without otherwise holding them. Here, an attempt to queue a task will fail if no threads are immediately available to run it, so a new thread will be constructed. This policy avoids lockups when handling sets of requests that might have internal dependencies.
Proposals:
// direct handoff with limited max pool size
// this fits better bolts design
ThreadPoolExecutor executor = new ThreadPoolExecutor(
CORE_POOL_SIZE,
BIG_NUMBER, // 64?
KEEP_ALIVE_TIME, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>());
or
// bounded queue with a small value
ThreadPoolExecutor executor = new ThreadPoolExecutor(
CORE_POOL_SIZE,
BIG_NUMBER, // 32?
KEEP_ALIVE_TIME, TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>(VERY_SMALL_NUMBER));
Using a big queue would have no effect on lockups. If there are 7 core threads needing other 7 threads to complete (as in my example), and the queue can hold 128 commands, this won’t be solved. We would have to wait for 128 requests to hang before having the first resolved. Ideally, VERY_SMALL_NUMBER < CORE_POOL_SIZE
to ensure at least a request is fulfilled.