Comments (13)
+1 for this feature request.
This would be amazing to have. I was trying to implement a parallizable implementation of the batch processor because the synchronous version is too slow for my needs but managing work such as removing successes from the SQS, removing retryable failures after pushing to the dlq, handing num retries before failure, etc that we all get for free via powertools is a lot of work and non-trivial.
Having async processing support officially would be amazing.
from powertools-lambda-java.
I think we should prioritize this for v2 @jeromevdl . What do you think?
from powertools-lambda-java.
Initial thoughts -
- Definitely a different interface parallel / sync as you say - the user needs to opt into this. It's easy to write Lambda code that assumes single threaded execution as that's the common case.
- I don't think we should create new thread pools to do this; that's quite heavyweight, and most stuff I would guess will be parking on IO and resuming, which isn't going to be sped up by throwing threads at it
- We'll need to revisit places we're using thread local to stash things - I think at least logging does this
It's a cool idea though! Especially where we are fanning out to do downstream IO it should make a substantial difference to runtime and be a nice cost optimization for users.
from powertools-lambda-java.
I don't think we should create new thread pools to do this; that's quite heavyweight, and most stuff I would guess will be parking on IO and resuming, which isn't going to be sped up by throwing threads at it
I don't understand... If we use CompletableFuture
, we need the executor coming from the thread pool. It's not visible from the user, it will be internal code in PT.
We'll need to revisit places we're using thread local to stash things - I think at least logging does this
Not logging, serialization module with JsonConfig
and the object mapper, but you need to review that one π. We probably need to check idempotency too, as we can integrate both batch and idempotency, not sure how thread-safe is idempotency...
from powertools-lambda-java.
I just wanted to Chime in add my support for this. We use Kotlin and having some kind of Async interface (which I believe exists in the Python library?) would be really cool to have, performance wise. Especially if it's a case where a Lambda pulls messages from a queue just to make a DDB call, or something like that.
from powertools-lambda-java.
Hi @itsmichaelwang, thanks for your comment. I'm not super familiar with Kotlin, but if you could share the kind of signature you expect, it would help. We probably won't make public an async interface (with java Future), and will handle it internally, but happy to discuss this...
from powertools-lambda-java.
I don't understand... If we use
CompletableFuture
, we need the executor coming from the thread pool. It's not visible from the user, it will be internal code in PT.
You need a ThreadPool, but I don't think you shouldn't have to create a new one but rather use thread pools the runtime provides and manages. It looks like we should use ForkJoinPool.commonPool()
; it'll be there already, we don't need to create more threads, we don't need to handle lifecycle, and it is explicitly for this sort of processing. I think we can even use something like the RecursiveTask
and skip thinking about the futurey nature of it at all.
Not logging
MDC uses thread local - isn't this a problem?
Serialization, idemp
π
from powertools-lambda-java.
CompletableFuture.supplyAsync
needs an executor
. I'm not an expert with multithreading apis and happy if ForkJoinPool.commonPool()
works, but if it gives us the same amount of thread as the parallel()
method (in Streams), then let's just use the parallel method... Some reading: see this, this, this, this ==> We certainly need to test the different approaches and measure to see what provides the best value.
From the last article:
As you can see, CompletableFutures provide more control over the size of the thread pool and should be used if your tasks involve I/O. However, if youβre doing CPU-intensive operations, thereβs no point in having more threads than processors, so go for a parallel stream, as it is easier to use.
We don't really know what users will do (CPU-intensive or not). Maybe parallel
is already a good improvement or should we provide the option to users (a boolean set to true if CPU intensive π)
MDC uses thread local - isn't this a problem?
Yes, you're right, MDC uses ThreadLocal. The way we handle powertools fields today is based on MDC so it could be yes... We probably would need to pass the MDC context map to the threads in order to fill their own version of it...
We should first try to find the best way to implement parallelism (Stream.parallel()
, CompletableFuture
, ...), and then see the impacts on other modules, but we can already list them for sure:
- logging (MDC)
- idempotency
- serialization (JsonConfig)
- large messages ?
- other ?
from powertools-lambda-java.
It's actually much simpler with parallel:
List<SQSBatchResponse.BatchItemFailure> batchItemFailures = event.getRecords()
.parallelStream()
.map(sqsMessage -> processTheMessageAndReturnOptionalOfBatchItemFailure(sqsMessage, context))
.filter(Optional::isPresent)
.map(Optional::get)
.collect(Collectors.toList());
Note that in both case I don't handle the FIFO failWholeBatch
... I guess we should not process in parallel messages in a FIFO queue ;)
from powertools-lambda-java.
Parallel streams! That looks exactly like what we want. I think this comment you made is why - we pick a level of abstraction where we're letting java handle the actual implementation of the parallelization:
We don't really know what users will do (CPU-intensive or not).
One question that nags a bit - do we want the user to be able to return a future in their process
function? I haven't thought hard about this.
I guess we should not process in parallel messages in a FIFO queue ;)
Good catch π
from powertools-lambda-java.
When is v2 set for if this is in scope for v2?
from powertools-lambda-java.
When is v2 set for if this is in scope for v2?
Hey @kyuseoahn , we're targeting first half 2024 for this at the moment!
from powertools-lambda-java.
Related Issues (20)
- Unexpected error occurred when using powertools in lambda function HOT 6
- Trace Annotations not getting added if Annotation Key has a space in it HOT 11
- Putting metrics fails when POWERTOOLS_METRICS_NAMESPACE is not set HOT 3
- Logging: Trace ID not logged with Java 17 runtime HOT 1
- Feature enhancement: Create a Subsegments for operations handled by Powertools HOT 5
- Logging: print `message` as JSON, instead of wrapping into a string HOT 14
- Exceptions not showing in Traces HOT 1
- RFC: Introduce Version 2
- RFC: Remove support for Java 8 in V2 HOT 7
- Feature enhancement (Logger): automatically intercept Runtime exceptions, and log information HOT 7
- Log level not restored at the end of execution HOT 2
- Support batch secrets retrieval in Parameters module HOT 6
- Maintenance: update jackson and aws-xray-recorder dependencies together HOT 4
- v2: Example logging configuration needs to be updated HOT 3
- v2 - Params builders do not supply default TransformationManager HOT 1
- bug - v2 - end-to-end tests broken
- AbstractCustomResourceHandler - Unable to send response HOT 2
- LambdaEcsLayout seems to incorrectly serialize the service field HOT 7
- Add support for CRaC
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from powertools-lambda-java.