Comments (20)
Code is there now.
from dataloader.
yes! that version was derived by @bbakerman from my vert.x implementation and has all vert.x dependencies removed :)
from dataloader.
Hi @xak2000 did you go any further with this?
For Java you can take a look at Vert.x from Eclipse at http://vertx.io/
It uses event loops, is light-weight, easy to use once you get familiar with the asynchronous programming model, and also it is polyglot, so you can mix-and-match code from a bunch of JVM-based languages, including Javascript, in a single application.
from dataloader.
Hi, @aschrijver. Thanks for your involvement!
Yes, I know about Vert.x, but Vert.x is a "container" for your applications (verticles). It is very good project, no doubt. But I wanted to implement just pure Java solution, without dependencies. It is straightforward to implement DataLoader in Vert.x (because of async nature of Vert.x and it's EventLoop), but then it will be usable only inside Vert.x application.
As for pure java implementation.. Now I don't have time to do futher investigations, but I think it is just undoable, sadly. :( Because there are no eventloop, we just can't know when to run a batch load function for all scheduled loaders. There are two options as I see:
-
Make contract to call some
dataloader.runLoaders()
function from user code from place where all loaders already scheduled (for example: just before return request to the browser). -
Run all batchloaders automatically 10ms after no new
loader.load
function call. I.e.:loader.load
call will schedule batch load function to run after 10ms from this call.- If another
loader.load
call will be called in this 10ms timeframe, then cancel scheduled batch load function and schedule it again after 10ms from now.
This way we can emulate current DataLoader functionality, but at the cost of increased (by 10ms) response time.
I don't really like any of this solutions because they lack of gracefulness of original idea of DataLoader - non-intrusive automatic batch loading.
Maybe combine this two solutions into one (shedule batch function to run automatically, but allow to run it manually if user knows when) can give a more or less acceptable result...
from dataloader.
Vert.x does not have to be the container for your entire app. You can also run it embedded without exposing to clients. You only ship the transitive dependency on vertx-core
in that case.
from dataloader.
Yes, you can run Vert.x embedded, but it is still a container (embedded into your application, but still a container), and DataLoader implementation based on Vert.x eventloop would be usable only from code runned from this embedded container, not the rest of the application.
So embedded it or not it doesn't matter. You still can't write pure-java solution which can be used in any java app as a library (not as a framework component).
from dataloader.
Gave some advice on #30 that's relevant here
from dataloader.
Hi @xak2000 , @antmdvs
I finally got some time to look more closely at dataloader, and I now see your issue. Vert.x also does not address this problem (though there may be ways to tweak the EventBus
).
But your options above, while less elegant than auto batching, still make it a useful utility I think. Additional options could be to have a maximum batch size after which stuff is dispatched automatically, regular intervals (in Vertx by setting a timer), or by a custom strategy an implementer plugs in.
I am writing a remote service proxy for a GraphQL schema, and intend to write a vertx-dataloader
that will batch individual data fetching requests involved in an incoming query (on the client) and send a single event bus message (Json) to backend GraphQL service implementation.
The data loader would be invoked after the query has been processed at the client proxy side, by explicitly calling a dataloader.dispatch()
which will hydrate the various data fetcher futures with values.
Implementation will first use CompositeFuture
(equivalent to Promise.all
), then later probably vertx's implementation of CompletableFuture
.
In many use cases the dispatch call can be automatically triggered, e.g. by setting it in Vert.x. on the endHandler
of a HTTP request.
Another interesting use I am thinking of is to implement the data loader with an asynchronous, cluster-wide map implementation (an AsyncMap
in vertx), so I get the caching in load-balanced nodes.
from dataloader.
BTW, what is funny to mention is that while data loader with the tick concept in NodeJS does not impact its asynchronous behaviour (correct?), in Vert.x it limits it slightly by being more of a delayed execution thing.
from dataloader.
I would caution against using the delay to pick up the requests in a batch. I've seen that approach used before and it can introduce real latency into your system.
For JS environments that don't support the Promise queue, this timer approach is used, but the delay used is "0ms" which in the case of JS schedules for the immediate next tick of the event loop.
from dataloader.
Thanks, this is something I will mention in the docs, just like you did with HTTP request scope, etc.
In general one should take care not to make blocking calls when there is stuff in the queue, and not put too much logic between batches :)
I will have code up in a couple of hours at https://github.com/engagingspaces/vertx-dataloader
BTW many bunches of thanks for creating the test coverage. It is boring to translate to Java, but helps real well. I will mention you in acknowledgements section of README, if you don't mind
from dataloader.
@leebyron I documented the change in the Java implementation very clearly (I hope) in vertx-dataloader/manual-dispatching
I think the change doesn't diminish the power of the utility in any way. On the contrary it does give additional and fine-grained control of batching and dispatching logic.
from dataloader.
Hi @aschrijver! Very glad to see you wrote some code! I'm opened this task, but I'm very busy now and can't give much effort to this task.
One question from a quick glance to the code: does it support many-step-loading?
What I mean:
- queuing load of some
user
s by theirid
s. - only after promises of each
user
load will be resolved we can queue loading ofpost
s of each loadeduser
.
The problem is that dispatch
will be called before any user promise will be resolved, so before any post loading will be queued. This is by design. If we don't call the dispatch
- user promises will never be resolved. But! After batch loading of users will be completed (and dispatch
will exit), resolved user promises will queue post
s loading tasks. But who will do second dispatch
call after this?
Ok I think now the example is slightly incorrect. Post
should be loaded from another instance of DataLoader
. But consider these are not posts, but firends of each user (so they are instances of user
model).
And even if we will use second DataLoader
for posts, how can we know the order of dispatch
calls which we need to follow for correctly enqueue and dequeue all load calls?
Maybe I understand something wrong, but when I considered the possibility to implement DataLoader
in java, I thinked about all these problems and can't find any reasonable solution without some eventloop-like system, when container (nodejs for example) will unqueue and dispatch all queued jobs in right order and any number of times for you.
Edit
Ok I think now we can load posts or friends in the first iteration using something like loadFriendsByUserId(userId)
. But if then we need to load friends of each friend (for example) - the problem persists.
Something like this in GraphQL, if i wrote this correctly:
users {
friends {
friends
}
}
from dataloader.
Hmm, yours is an interesting case to investigate further.
In general with manual dispatching being the exception the implementation should be exactly identical to its counterpart in NodeJS. I still haven't ported all tests, so you have to take me on my word here.
Just like the manual dispatching makes it necessary to invoke dispatch()
at an appropriate location, you are also responsible creating load requests at the appropriate moments in the lifecycle of a batch operation.
In future I will add features that will flush the queue after a dispatchTimeout
, or on regular timer intervals, depending on additional options you provide.
When you call dispatch()
individual Future
s will start completing / resolving already. The fetch
immediately returns with a CompositeFuture
of the aggregated result (similar to Promise.all
). The composite is not complete yet, however. You will have to set a handler on it and check for success or failure before touching the list of constituent results (in Vert.x they are still Future
s with the value <V>
in future.result()
if future.succeeded()
.
On your last point I documented the benefits you still have even with manual fetching (and some additional ones) in the differences to reference implementation section as well.
from dataloader.
If you have the time later it would be nice if you created a PoC project or a Gist to test-drive your use case!
from dataloader.
Forgot to mention that the preparation stage of the data loader lifecycle (i.e. adding load requests to the batch queue, from loader instantiation to just before dispatch) is usually very fast as long as no code blocks the event loop!
In Vert.x every invocation is asynchronous by default, and if it is not it should be assigned to a worker verticle that runs on a thread pool (regular verticles always run on the same thread so you can regard them as single-threaded and reap the rewards of that simplicity, but you probably know this as its the main selling point of Vert.x, besides being clustered and polyglot, and modular, and... much more goodness, that is 😄 ).
from dataloader.
I have created engagingspaces/vertx-dataloader#3 on the Vert.x DataLoader project for this use case.
from dataloader.
I added a nice diagram to README.md that shows the concepts of the direction the Vert.x DataLoader is going to (not quite there yet):
from dataloader.
Closing this issue since there's now a project and discussion to take this further.
from dataloader.
For anyone interested: Pure Java 8 implementation exists now.
It still requires the manual dispatching for obvious reasons (discussed in this topic). But graphql-java natively supports it through DataLoaderDispatcherInstrumentation
.
from dataloader.
Related Issues (20)
- Move CI to GitHub Actions
- Setup publish token for CI
- Change default branch `master` to `main` HOT 1
- Setup Renovate Bot to keep dependencies up to date HOT 3
- [QUESTION] Why aren't keys being de-duplicated even when the cache is disabled? HOT 6
- ✨ [REQUEST]: Add examples for cacheKeyFn and cacheMap HOT 2
- [REQUEST]: How to pass auth headers to dataloader HOT 4
- [QUESTION] README.md doesn't show on npmjs.com HOT 2
- CacheMap.get calls should await the promised value HOT 1
- [QUESTION] Release changes? HOT 9
- [QUESTION] name is now required in options? HOT 4
- [BUG] Readme is missing from NPM HOT 2
- [BUG] `this` context not available in typescript definitions
- [BUG] Use caution with `jest.useFakeTimers()` HOT 1
- [REQUEST] support vercel edge functions by default HOT 4
- Ever considered the use of `queueMicrotask`? HOT 1
- Coalescing multiple load() calls HOT 3
- [REQUEST] Unify the way `load` and `loadMany` handle errors
- -
- [REQUEST] Add batch grouping
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataloader.