I'm very impressed by DataLoader and trying to port the DataLoader into Java world but

yes! that version was derived by <a class="user-mention notranslate" data-hovercard-ty

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Gave some advice on <a class="issue-link js-issue-link" data-error-text="Failed to loa

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Java port about dataloader HOT 20 CLOSED

xak2000 commented on May 5, 2024

Java port

from dataloader.

Comments (20)

aschrijver commented on May 5, 2024 1

Code is there now.

from dataloader.

aschrijver commented on May 5, 2024 1

yes! that version was derived by @bbakerman from my vert.x implementation and has all vert.x dependencies removed :)

from dataloader.

aschrijver commented on May 5, 2024

Hi @xak2000 did you go any further with this?

For Java you can take a look at Vert.x from Eclipse at http://vertx.io/
It uses event loops, is light-weight, easy to use once you get familiar with the asynchronous programming model, and also it is polyglot, so you can mix-and-match code from a bunch of JVM-based languages, including Javascript, in a single application.

from dataloader.

xak2000 commented on May 5, 2024

Hi, @aschrijver. Thanks for your involvement!

Yes, I know about Vert.x, but Vert.x is a "container" for your applications (verticles). It is very good project, no doubt. But I wanted to implement just pure Java solution, without dependencies. It is straightforward to implement DataLoader in Vert.x (because of async nature of Vert.x and it's EventLoop), but then it will be usable only inside Vert.x application.

As for pure java implementation.. Now I don't have time to do futher investigations, but I think it is just undoable, sadly. :( Because there are no eventloop, we just can't know when to run a batch load function for all scheduled loaders. There are two options as I see:

Make contract to call some dataloader.runLoaders() function from user code from place where all loaders already scheduled (for example: just before return request to the browser).
Run all batchloaders automatically 10ms after no new loader.load function call. I.e.:
- loader.load call will schedule batch load function to run after 10ms from this call.
- If another loader.load call will be called in this 10ms timeframe, then cancel scheduled batch load function and schedule it again after 10ms from now.
This way we can emulate current DataLoader functionality, but at the cost of increased (by 10ms) response time.

I don't really like any of this solutions because they lack of gracefulness of original idea of DataLoader - non-intrusive automatic batch loading.

Maybe combine this two solutions into one (shedule batch function to run automatically, but allow to run it manually if user knows when) can give a more or less acceptable result...

from dataloader.

aschrijver commented on May 5, 2024

Vert.x does not have to be the container for your entire app. You can also run it embedded without exposing to clients. You only ship the transitive dependency on vertx-core in that case.

from dataloader.

xak2000 commented on May 5, 2024

Yes, you can run Vert.x embedded, but it is still a container (embedded into your application, but still a container), and DataLoader implementation based on Vert.x eventloop would be usable only from code runned from this embedded container, not the rest of the application.

So embedded it or not it doesn't matter. You still can't write pure-java solution which can be used in any java app as a library (not as a framework component).

from dataloader.

leebyron commented on May 5, 2024

Gave some advice on #30 that's relevant here

from dataloader.

aschrijver commented on May 5, 2024

Hi @xak2000 , @antmdvs
I finally got some time to look more closely at dataloader, and I now see your issue. Vert.x also does not address this problem (though there may be ways to tweak the EventBus).

But your options above, while less elegant than auto batching, still make it a useful utility I think. Additional options could be to have a maximum batch size after which stuff is dispatched automatically, regular intervals (in Vertx by setting a timer), or by a custom strategy an implementer plugs in.

I am writing a remote service proxy for a GraphQL schema, and intend to write a vertx-dataloader that will batch individual data fetching requests involved in an incoming query (on the client) and send a single event bus message (Json) to backend GraphQL service implementation.
The data loader would be invoked after the query has been processed at the client proxy side, by explicitly calling a dataloader.dispatch() which will hydrate the various data fetcher futures with values.
Implementation will first use CompositeFuture (equivalent to Promise.all), then later probably vertx's implementation of CompletableFuture.

In many use cases the dispatch call can be automatically triggered, e.g. by setting it in Vert.x. on the endHandler of a HTTP request.

Another interesting use I am thinking of is to implement the data loader with an asynchronous, cluster-wide map implementation (an AsyncMap in vertx), so I get the caching in load-balanced nodes.

from dataloader.

aschrijver commented on May 5, 2024

BTW, what is funny to mention is that while data loader with the tick concept in NodeJS does not impact its asynchronous behaviour (correct?), in Vert.x it limits it slightly by being more of a delayed execution thing.

from dataloader.

leebyron commented on May 5, 2024

I would caution against using the delay to pick up the requests in a batch. I've seen that approach used before and it can introduce real latency into your system.

For JS environments that don't support the Promise queue, this timer approach is used, but the delay used is "0ms" which in the case of JS schedules for the immediate next tick of the event loop.

from dataloader.

aschrijver commented on May 5, 2024

Thanks, this is something I will mention in the docs, just like you did with HTTP request scope, etc.
In general one should take care not to make blocking calls when there is stuff in the queue, and not put too much logic between batches :)

I will have code up in a couple of hours at https://github.com/engagingspaces/vertx-dataloader

BTW many bunches of thanks for creating the test coverage. It is boring to translate to Java, but helps real well. I will mention you in acknowledgements section of README, if you don't mind

from dataloader.

aschrijver commented on May 5, 2024

@leebyron I documented the change in the Java implementation very clearly (I hope) in vertx-dataloader/manual-dispatching

I think the change doesn't diminish the power of the utility in any way. On the contrary it does give additional and fine-grained control of batching and dispatching logic.

from dataloader.

xak2000 commented on May 5, 2024

Hi @aschrijver! Very glad to see you wrote some code! I'm opened this task, but I'm very busy now and can't give much effort to this task.

One question from a quick glance to the code: does it support many-step-loading?
What I mean:

queuing load of some users by their ids.
only after promises of each user load will be resolved we can queue loading of posts of each loaded user.

The problem is that dispatch will be called before any user promise will be resolved, so before any post loading will be queued. This is by design. If we don't call the dispatch - user promises will never be resolved. But! After batch loading of users will be completed (and dispatch will exit), resolved user promises will queue posts loading tasks. But who will do second dispatch call after this?

Ok I think now the example is slightly incorrect. Post should be loaded from another instance of DataLoader. But consider these are not posts, but firends of each user (so they are instances of user model).

And even if we will use second DataLoader for posts, how can we know the order of dispatch calls which we need to follow for correctly enqueue and dequeue all load calls?

Maybe I understand something wrong, but when I considered the possibility to implement DataLoader in java, I thinked about all these problems and can't find any reasonable solution without some eventloop-like system, when container (nodejs for example) will unqueue and dispatch all queued jobs in right order and any number of times for you.

Edit
Ok I think now we can load posts or friends in the first iteration using something like loadFriendsByUserId(userId). But if then we need to load friends of each friend (for example) - the problem persists.

Something like this in GraphQL, if i wrote this correctly:

users {
  friends {
    friends
  }
}

from dataloader.

aschrijver commented on May 5, 2024

Hmm, yours is an interesting case to investigate further.

In general with manual dispatching being the exception the implementation should be exactly identical to its counterpart in NodeJS. I still haven't ported all tests, so you have to take me on my word here.

Just like the manual dispatching makes it necessary to invoke dispatch() at an appropriate location, you are also responsible creating load requests at the appropriate moments in the lifecycle of a batch operation.

In future I will add features that will flush the queue after a dispatchTimeout, or on regular timer intervals, depending on additional options you provide.

When you call dispatch() individual Futures will start completing / resolving already. The fetch immediately returns with a CompositeFuture of the aggregated result (similar to Promise.all). The composite is not complete yet, however. You will have to set a handler on it and check for success or failure before touching the list of constituent results (in Vert.x they are still Futures with the value <V> in future.result() if future.succeeded().

On your last point I documented the benefits you still have even with manual fetching (and some additional ones) in the differences to reference implementation section as well.

from dataloader.

aschrijver commented on May 5, 2024

If you have the time later it would be nice if you created a PoC project or a Gist to test-drive your use case!

from dataloader.

aschrijver commented on May 5, 2024

Forgot to mention that the preparation stage of the data loader lifecycle (i.e. adding load requests to the batch queue, from loader instantiation to just before dispatch) is usually very fast as long as no code blocks the event loop!

In Vert.x every invocation is asynchronous by default, and if it is not it should be assigned to a worker verticle that runs on a thread pool (regular verticles always run on the same thread so you can regard them as single-threaded and reap the rewards of that simplicity, but you probably know this as its the main selling point of Vert.x, besides being clustered and polyglot, and modular, and... much more goodness, that is 😄 ).

from dataloader.

aschrijver commented on May 5, 2024

I have created engagingspaces/vertx-dataloader#3 on the Vert.x DataLoader project for this use case.

from dataloader.

aschrijver commented on May 5, 2024

I added a nice diagram to README.md that shows the concepts of the direction the Vert.x DataLoader is going to (not quite there yet):

from dataloader.

leebyron commented on May 5, 2024

Closing this issue since there's now a project and discussion to take this further.

from dataloader.

xak2000 commented on May 5, 2024

For anyone interested: Pure Java 8 implementation exists now.
It still requires the manual dispatching for obvious reasons (discussed in this topic). But graphql-java natively supports it through DataLoaderDispatcherInstrumentation.

from dataloader.

Java port about dataloader HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent