graphql-java / java-dataloader Goto Github PK

View Code? Open in Web Editor NEW

487.0 20.0 88.0 669 KB

A Java 8 port of Facebook DataLoader

License: Apache License 2.0

Java 100.00%

dataloader java facebook-dataloader batch graphql batch-loader

java-dataloader's Introduction

GraphQL Java

Discuss and ask questions in our Discussions: https://github.com/graphql-java/graphql-java/discussions

This is a GraphQL Java implementation.

Latest build in Maven central: https://repo1.maven.org/maven2/com/graphql-java/graphql-java/

Documentation

The GraphQL Java book, from the maintainers: GraphQL with Java and Spring

See our tutorial for beginners: Getting started with GraphQL Java and Spring Boot

For further details, please see the documentation: https://www.graphql-java.com/documentation/getting-started

If you're looking to learn more, we (the maintainers) have written a book! GraphQL with Java and Spring includes everything you need to know to build a production ready GraphQL service. The book is available on Leanpub and Amazon.

Please take a look at our list of releases if you want to learn more about new releases and the changelog.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By contributing to this project (commenting or opening PR/Issues etc) you are agreeing to follow this conduct, so please take the time to read it.

License

Supported by

YourKit supports this project by providing the YourKit Java Profiler.

java-dataloader's People

Contributors

Stargazers

Watchers

Forkers

acidbluebriggs mpires gkesler ors-woon tdraier lburja bitlather raderio trroy0525 highgrove syedwasimali leoncius kimdv uuidcode wiltonlazary krvinod1 budjb monad-one michalbogdal streamwind markmarine houhlin lerder valley51 wenliu2 goldmann robinbraemer dsyzhu intuit pal-thomassen tinnou harishj1729 sujayopensource rajishakya leeny324 aoudiamoncef zoidyzoidzoid mashukgit hjfeilg vamosraghava rharriso dugenkui03 balajisrinivasan 8btc-onepiece seemnobody inego hamidshahid dfa1 zuodh fstaoxue sachinpkale pera1234 qzchenwl xxu95 tarancloudicia iiitrke csz benneighbour kelseyfrancis leonqc cazoole gongchangyou lkorth itpreneur berngp qiyongqiang jord1e-forks alex079 lana11s hanson76 samuelandalon zrbrown imtiazea alexandrecarlton bchadwic agebru byd-android-2017 iq-scm jimbethancourt fabrisrugero tylerscoville feiqitian kilink nnoip5 anshuls1208 parksosaurus

java-dataloader's Issues

DataLoader 3.x will not be upgraded in graphql-java 17.0

The latest version of java DataLoader is 3.1

The 3.x version adds a new and long desired capability - a proper ValueCache

The previous CacheMap (which come from the original Vertx port) was not a value cache at all - it was a cache of promises to values

As such it cant be serialised into a network shared cache like REDIS or Memcached say.

The 3.x code base set out fix that and did.

However we missed and important problem. Introducing an async get to the ValueCache meant that previous garuantees
that once dataloader.get(...) was called then the request has been placed into the batch queue and hence can be optimally batched.

This led to #90 - aka the dreaded "when its the right time to batch" problem

There is a PR up to fix this and it will BUT it may also introduce other problems, like premature batcing.

As such we have decided to NOT upgrade graphql-java 17.0 to DataLoader 3.x and we need to reconsider how we improve this situation

The 3.x version right now is not optimal - it will work (especially if the new ScheduledDataLoaderRegistry is used) however this approach needs revision on reflection.

Stay tuned...

How to let BatchLoader know the columns of current DataFetchingEnvironment?

I've created a demo project about graphql-java by kotlin: https://github.com/babyfish-ct/grahpql-java-example. but I met a problem and I don't know how to do it better

For the columns that are neither primary keys nor foreign keys of any tables, this demo only selects the columns that's required by GraphQL. that means the BatchLoader need to know the name list of selected columns of the current DataFetchingEnvironment.

Not like the global context, DataFetchingEnvironment objects for different asociation data loading are not same, so it's impossible to use BatchLoaderWithContext.

So I creatd the class

data class LoaderKey<K>(
    val value: K,
    val propNames: Collection<String>
)

in https://github.com/babyfish-ct/grahpql-java-example/blob/master/src/main/kotlin/org/frchen/graphql/example/loader/Common.kt. The first property 'value' is the DataKey of DataLoader, the second property 'propNames' is the collection "env.selectionSet.get().keys"(env is DataFetchingEnvironment object) that is used to specify the select name list of columns for current assocation data loading. please see the functions "loadOptionalReferenceAsync", "loadRequiredReferenceAsync" and "loadListAsync" in the same file.

This solution workds but it's stupid, because different LoaderKey objects often have the same propNames, this is wasteful.

How can I do it better?

Provide an light weight Either to allow Exception and Value to be represented as one value

This could be used as a coming out of a BatchLoader.

That way people can represent values and capture exceptions better!

 DataLoader<String,Either<User>> dataLoader = ...

 Either<User> either = ...
 if (either.hasException() {
    println either.getException()

DataLoader hangs on several load() with the same key

If caching is disabled (i.e. DataLoaderOptions.newOptions().setCachingEnabled(false)), if you issue several load() with the same key, the loader will eventually hang once dispatch() is called.

This is most likely due to the field loaderQueue (defined in DataLoader class), which is a Map. On duplicate keys, the previous value (a CompletableFuture) is lost, and the future will never complete.

If caching is enabled, the same problem can happen due to a race condition. If 2 threads try to load() the same key at the same time, if the key is not yet in the futureCache, both threads will add it to the loaderQueue map, and one of the futures will be lost.

DataLoader support for get-if-present

I'm wondering if you have any thoughts on adding a method to DataLoader to support get-if-present semantics (could be called getIfPresent, getIfCached, getCachedValue, etc.)

I have a use-case where there's an API that supports fetching either a partial or complete object. In my service, I have separate DataLoaders for these types of fetches. As you might expect, when I need to fetch the partial object I use the partial DataLoader, and when I need to fetch the full object I use the other DataLoader. However, if I've already fetched the full object then theoretically I can use that cached result to satisfy any subsequent partial fetches that come in. To support this, I want the partial DataLoader to first do a get-if-present on the complete DataLoader (without triggering a load)

Let me know if this use-case makes sense and if you would consider a PR that implements it

ValueCache: batch function isn't always triggered on cache miss

Describe the bug
I tried to add a Redis cache by leveraging the new ValueCache feature. On cache misses, my get method would always return an exceptionally complete future. I noticed that everything is working as expected when batch loading is disabled. However, if I enabled batching, the batch function wouldn't always be triggered to fetch the data from the backend sources.

I see that in DataLoaderHelper, we would add the futures to a loader queue on cache misses when batching is enabled. Is it possible that the dispatch has already been called before the future is added to the queue? We don't manually make a dispatch call on the data loader. We just define a DataLoaderRegistry and pass it to the graphql-java engine in our codes. The batch mode is working fine if we only set the cache map and max batch size in the data loader options, but not if we also set a new value cache.

To Reproduce
Version: 3.0.1

I could reproduce the issue if I added a 3 seconds delay before calling future.completeExceptionally(exception), e.g.:

public CompletableFuture<V> get(K key) {
   ...
   CompletableFuture<V> future = new CompletableFuture<>();
   redisGetFuture.onComplete((value, exception) -> {
        delay();
        if (exception == null) {
            if (value == null) {
                future.completeExceptionally(new RuntimeException("null value"));
            }
            future.complete(value);
        } else {
            future.completeExceptionally(exception);
        }
   }
   return future;
}

private void delay() {
    try {
        TimeUnit.SECONDS.sleep(3);
    } catch (Exception e) {
    }
}

keyContext merge function under batching

I'm very excited by your work and wonder if i could get your approval on adding support for keyContext merging while batching the same key with different context. I'm using the keyContexts of BatchLoaderEnvironment to ship the selection fields for a key. Some queries might have the same key with different selection fields and the batching mechanism in DataLoaderHelper's mkKeyContextMap method overrides the context with the latest context encountered for that key. I'll be glad if mkKeyContextMap be handed a function to handle a merge in such cases.

Provide an option to turn off caching of exceptions

I am using GraphQL Java 8.0, which uses Dataloader 2.0.2 . I have experienced it lot of times that dataloader caches the exceptions returned from the batchload function. After reading the code I came across this piece in Dataloader.java .

return batchLoad
                .toCompletableFuture()
                .thenApply(values -> {
                    assertState(keys.size() == values.size(), "The size of the promised values MUST be the same size as the key list");

                    for (int idx = 0; idx < queuedFutures.size(); idx++) {
                        Object value = values.get(idx);
                        CompletableFuture<V> future = queuedFutures.get(idx);
                        if (value instanceof Throwable) {
                            stats.incrementLoadErrorCount();
                            future.completeExceptionally((Throwable) value);
                            // we don't clear the cached view of this entry to avoid
                            // frequently loading the same error
                        } else if (value instanceof Try) {

It says that dataloader is caching the error response from batchloader. I want to know why this type of design was added ?
I have a long duration cache for some fields and if their datasource fails , it will return error responses to my graphql server. If it recovers after small time then I would still return the stale responses (errors) to my clients because of this feature in dataloader.
Also if there is an issue with frequently loading the same error, then the users can add some code like circuit breakers etc to stop calling the backend.
I am not sure if this is fixed for the newer versions of dataloader.
If there is a way to not store the exceptions can you please inform me.

I think at the time it was decided that if a value was poisoned then dont keep asking for it. This works well for short lived per request caches
However ads you say if you have long lived caches, then this works against you.
Can you please raise an issue on data loader project for this.
In the meantime you can work around it by

clearing the cache for exceptionally values
implementing a custom cache function that simply ignores exceptions
See org.dataloader.DataLoaderOptions#setCacheMap

How to controller dataloader scope

In your readme.md, it says:

If you are serving web requests then the data can be specific to the user requesting it. If you have user specific data then you will not want to cache data meant for user A to then later give it user B in a subsequent request.

The scope of your DataLoader instances is important. You might want to create them per web request to ensure data is only cached within that web request and no more.

If your data can be shared across web requests then you might want to scope your data loaders so they survive longer than the web request say.

And the dataloader.java is not support extend, such as #37

So if a request has a same thread or a same tracer, can you provide some idea to the dataloader scope a request? such as Best Practice , etc.

Per Request loaders with graphql

I'm struggling to follow the documentation as to how I can implement per request data loaders.
The docs at [http://graphql-java.readthedocs.io/en/v7/batching.html] suggest creating a single instance schema up front as this is a somewhat expensive operation, and then creating dataloaders per request.

Given that runtime wiring must be provided at time of schema creation, and in the runtime wiring data fetchers reference the data loaders (as per the docs), I'm not sure I can then create these data loaders later at the time of an individual request.

The only solution I could think of is to include the DataLoaderRegistry in my context object, which is then available at time of DataFetcher execution, in order to get a reference for the correct data loader instance.

I'm not sure if I've explained this well, but I'm surprised I haven't been able to find an example as this seems to be something that would be a common issue.

Regards
Chris

Support for partitioning keys

Sorry if this is already supported or has already been discussed, I wasn't able to find anything in the docs or in another GitHub issue. We have some APIs that support batching, but only if all of the inputs are uniform in some way. For example, you could have an image resizing API that supports batch resizing, but only if the desired dimensions are the same. We want to use java-dataloader for the batching and caching support, but handling these sorts of constraints ends up being a bit of a hassle. We get a batch of keys that might have mixed dimensions, so inside our load function we need to partition the keys, make multiple HTTP requests, and handle combining the futures and merging the data. Here is a simplified example of what this might involve:

public class ImageResizeDataLoader implements MappedBatchLoader<ImageResizeRequest, ImageResizeResponse> {

  @Override
  public CompletionStage<Map<ImageResizeRequest, ImageResizeResponse>> load(Set<ImageResizeRequest> keys) {
    Multimap<ResizeDimensions, ImageResizeRequest> dimensionsToKeys = keys
        .stream()
        .collect(Multimaps.toMultimap(key -> key.resizeDimensions, Function.identity(), HashMultimap::create));

    Map<ImageResizeRequest, ImageResizeResponse> results = new ConcurrentHashMap<>();
    List<CompletableFuture<?>> futures = new ArrayList<>();

    dimensionsToKeys.asMap().forEach((dimensions, batch) -> {
      futures.add(doLoad(dimensions, batch).thenAccept(results::putAll));
    });

    return CompletableFuture.allOf(futures.toArray(new CompletableFuture<?>[0])).thenApply(ignored -> results);
  }

  private CompletableFuture<Map<ImageResizeRequest, ImageResizeResponse>> doLoad(
      ResizeDimensions dimensions,
      Collection<ImageResizeRequest> batch
  ) {
    HttpRequest request = HttpRequest
        .newBuilder()
        .setMethod(StandardHttpMethod.POST)
        .setUri("https://myapi.com/resize")
        .addQueryParam("width", dimensions.width)
        .addQueryParam("height", dimensions.height)
        .setBody(batch.stream().map(i -> i.imageUrl).collect(Collectors.toSet()))
        .build();

    // make request, handle response
    return null;
  }

  public static class ResizeDimensions {
    public int width;
    public int height;

    // equals/hashCode
  }

  public static class ImageResizeRequest {
    public String imageUrl;
    public ResizeDimensions resizeDimensions;

    // equals/hashCode
  }

  public static class ImageResizeResponse {
    public String resizedUrl;
  }
}

Is there a simpler way to handle this that I'm missing?

best practice to load many relations ids from database

I try to implement a solution that uses graphql-java and dataloader over a database.

I would like to know the best pratice to load a many relations when the nested ids required a database query.

in the documentation example, imagine that we replace starWarsCharacter.getFriendsIds() by a database query getFriendIdsFromDB(starWarsCharacter).
The friendsDataFetcher would looks like :

 DataFetcher friendsDataFetcher = new DataFetcher() {
            @Override
            public Object get(DataFetchingEnvironment environment) {
                StarWarsCharacter starWarsCharacter = environment.getSource();
                List<String> friendIds = getFriendIdsFromDB(starWarsCharacter);
                return characterDataLoader.loadMany(friendIds);
            }
        };

In this case it generates again a N+1 fetch problem. For each Character we have to query for the friends ids.

How to efficiently manage that case with the dataloader?

2.2.0 doesn't have checksums for .pom files in Maven Central

Version 2.2.0 doesn't have checksums for the .pom files in Maven Central which causes issues when trying to import into a local repository.

http://central.maven.org/maven2/com/graphql-java/java-dataloader/2.2.0/

Create an Instrumentation from graphql-java that can call dispatch()

This would make the library much easier to use in graphql

Is the data loader's per-key context allowed to be null?

Based on looking at the code and testing it, it appears that this data fetcher will pass through the key context to the data loader, even if it's null:

    TypeRuntimeWiring.newTypeWiring("Query")
        .dataFetcher(
            "myField",
            environment -> {
              String myId = environment.getArgument("myId");
              
              // it's OK if this is null, because the keyContext is allowed to be null?
              Integer myKeyContext = getNullableKeyContext(...);

              return environment.getDataLoader("myLoader").load(myId, myKeyContext);
            })
        .build();

Can you confirm, that though? Because nothing I found in the Javadocs or https://www.graphql-java.com/documentation/v16/batching#passing-context-to-your-data-loader says whether the per-key context is allowed to be null.

Make the DataLoaderRegistry use a supplier pattern

See graphql-java/graphql-java#1902

benmccann commented 4 days ago

The DataLoaderRegistry should be request-scoped. That means that users are creating every DataLoader for every request even if none or only one or two are used.

Right now we would do something like:

dataLoaderRegistry.register("user", DataLoader.newDataLoader(userBatchLoader));

We could probably fix this by making DataLoaderReigstry a singleton and instead registering a function that would provide a DataLoader:

dataLoaderRegistry.register("user", () -> DataLoader.newDataLoader(userBatchLoaderProvider.get()));

Allow default value in DataLoaderOptions when a value is missing in a mapped batch loader

Taken from graphql-java/graphql-java#1403

MappedBatchLoader is a great addition to the data loading architecture. Sadly, we can't use it in many cases. Frequently we data load lists like so:

type Parent {
  id: String!
  children: [Child!]!  # lazy-loaded via data loader, can not be null
}

When data loading children the return type of the mapped batch loading function is Map<String, List>. When a key is not present in the map, DataLoaderHelper::invokeMapBatchLoader substitutes null. However, in this case outlined above it should instead substitute an empty list.

I propose adding an additional data loader option that would let us set this default value to an arbitrary value. Another solution that might suffice would be the helper first testing whether the loadResult is a Collection and if so, substituting an empty collection by default.

Null pointer in dispatchQueueBatch v2.1.1

java-dataloader/src/main/java/org/dataloader/DataLoaderHelper.java

Line 200 in 19234f1

future.complete(val);

We have gotten a couple exceptions running in production around the above line of code. The trace is:

java.util.concurrent.CompletionException: java.lang.NullPointerException
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:645)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
	at org.dataloader.DataLoaderHelper.lambda$dispatchQueueBatch$2(DataLoaderHelper.java:200)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.NullPointerException: null

We are going to take a look at this on priority at some point coming up here, but wanted to drop it here in case the solution is obvious in some way. Any insight is certainly welcome as well.

Create a sample application that demonstrates java-dataloader and its efficiency

This helps shape out the library better and gives people something to investigate

Outdated gradle version in the README

Hi,

It seems the docs reference an older version from the one published (only checked maven)

Am I missing something? If not let me know and I'll create a PR.

Create the ability to keep statistics on the data loader

We need to know how many times the loader is called and how often it calls its batch function. Also cache hit stats would be nice as well

Question: Field selection and error handling in a batch loader?

I have two unrelated questions:

1. Error handling

Neither org.dataloader.BatchLoader::load nor org.dataloader.BatchLoaderWithContext::load allow one to throw an exception while attempting to fetch some data from some source, because the interface does not declare that an exception is thrown:

    CompletionStage<List<V>> load(List<K> keys);

What is the correct way to handle an error, when, for example, my repository blows up while trying to load a list of entities by the provided keys? Similarly, what if some external HTTP request to fetch a list of entities fails in an unrecoverable way? It seems wrong in these cases to return an empty list; I would prefer to throw an exception and cause the entire GraphQL call stack to unwind instead of returning partial data.

2. Field selection

The BatchLoaderEnvironment passed to a BatchLoaderWithContext::load call does not allow me to inspect the request for the selected fields that caused loading of some entities to occur. Is there a way to do this? I prefer to select only the columns absolutely necessary from the database/API/etc. to optimize performance.

I can imagine that it may be difficult to traverse all nodes on the query to aggregate the minimum acceptable selection of fields from a given type, but it doesn't seem impossible. What I'm hoping is possible is something like this:

Given a query like this:

query {
  foo {
    bar {
      fieldA
      fieldB
      fieldC
    }
  }
  bar {
    fieldA
    fieldD
  }
}

...that I can be able to then only select fields A, B, C, and D, instead of a SELECT * (assuming a relational database is the backing store).

Thanks for your help!

Polyglot database support for relationships?

Admittedly, this isn't so much an issue as it is a request for guidance. It might turn into a feature request.

I'm working on a graphql schema that will source information from a Mongo database and a neo4j database. The documents I'm storing in Mongo are rather large (up to 2MB in some cases), and those documents are highly related. It was easiest to provide traversal between objects via a graph database, and we're not terribly concerned about most of the schema of the documents in Mongo, so the tech makes sense for our use case.

The dataloader library is a big help in optimizing our db queries to Mongo to load documents, but I'm currently at a loss as to how to use neo4j to query for which documents I need to load. The general order of operation looks something like this, for our data:

Load first-level documents from Mongo.
For a second-level portion of a query, collect the entire set of IDs from the results of the first level query, make a single query to neo4j for a list of all related IDs for the relationship type, and map those results to the source IDs (resulting roughly in a mapping of source ID to a list of related IDs).
Load second-level documents from Mongo, with the IDs retrieved from the previous step.

The dataloader library makes the process easy if you can derive the IDs of related documents from a source object, but in my case I'm not able to determine that without a neo4j query. The best I know how to do this at the moment is run a neo4j query for an individual source object in a datafetcher for each source object, meaning I still end up roughly with an n+1 problem.

Any guidance on how to optimize for bulk loading in this scenario?

Add getIfCached(key) in DataLoader

It'd be great if the DataLoader had a getIfCached method returning the future (or the actual object) if it exists, without triggering a load. Alternatively if the DataLoader had it's member variables as protected so that the class could be extended.

We have a quite complicated setup with lots of relationships between entities as well as partial loading of some entities. Occasionally we want to check if an item is in the DL cache and use it if it is (and has the data we need), otherwise go down a different path.

We also prime the cache of one data loader when loading a list of the same entities in a batch loader belonging to another data loader.

We try to avoid chaining dataloader calls in a thenCompose due to manual invocation of dispatch() causing double loads of the same entity from the same data-loader, so we instead want to implement some slightly more intelligent batch loaders.

CacheMap<K, V> is the wrong signature

The custom cache map interface has this signature CacheMap<U, V>

This might imply that it can be a cache of keys to values.

And hence you can back it with a cache like Redis (that requires values to be serializable in some way).

This is not True. It's actually a cache of promises to values aka CompleteableFuture<V>

Niavely today people have implemented a Cache<Stirng,Pojo> and then watched it blow up with class cast exceptions.

This should be fixed.

And it has a contract that is not clear.

if the dataloader "sets" a CF<V> into the cache it MUST get the exactly the same CF<V> object back on a future cached load("keyX")

The reason for this is that the batch loader causes each promise CF<V> to be "completed' given the "keyX". So if 2 calls hold different CF<V> objects, only 1 will be completed and hence the other will hang for ever.

We need to make this more clear in the code and JavaDoc

Here is an example outline the problem if the cache does not give out the same CF<V>

    CompleteableFuture<V> codePathA() {
       return dataLoader.load("keyX")
   }

and then later but before dispatch

    CompleteableFuture<V> codePathB() {
       return dataLoader.load("keyX")
   }

then later

   dataLoader.dispatch()

If ther custom cache DID NOT give back the exact same CompleteableFuture<V> object then code path B would never complete because that CompleteableFuture<V> would never be know about by the batch loading code and hence never completed. And hence it would hang

Exception when using @RequestScope and async CompletableFuture

Hello,

when I use an asynchronous batch data loader together with a request-scoped DataLoader, the thing crashes when the batch data loader calls a backend service that needs a request-scoped JPA EntityManager.

This is because the asynchronous batch data loader starts a new thread, and the dependency injection framework (Ratpack/Guice in this case) does not find the request scope threadlocal any more.

How is this supposed to work?

Best...
Matthias

P.S. As a work-around, I removed the asynchronous stuff and used CompletableFuture.completedFuture() in the Batch Loader, too.

composing dataloaders

Hi,

I have a datafetcher where I use 2 dataloaders in sequence: the first to translate from 1 ID to another, the second to fetch data corresponding to the second ID.

loader1.load(id1).thenCompose(id2 -> loader2.load(id2))

This hangs because dispatchAll() is not called again after loader1 completes.
I can work around that by adding that call inside the thenCompose() lambda but then it is called for every id2 which is ugly at the very least.

Is there a better way of doing this?

How do I store the cache to a remote server

I want to store the cache to a remote server, and i implement CacheMap.
Then i got :
Serialized class java.util.concurrent.CompletableFuture must implement java.io.Serializable

Make the DataLoader constructor protected to make it more customisable

I would like to make the DataLoader to load the Key context in a type safe manner and in order to reuse the current infrastructure , I choose to extend the DataLoader :

public abstract class MyDataLoader<K, V, C> extends DataLoader<K, V> {

      public MyDataLoader(){
          super(newMappedBatchLoaderWithContext() , null); 
       }

       public CompletableFuture<V> loadWithKeyContext(K key, C keyContext) {
         return super.load(key, keyContext);
    }

      public abstract MappedBatchLoaderWithContext<K, V> newMappedBatchLoaderWithContext();
}

But the constructor DataLoader(Object batchLoadFunction, DataLoaderOptions options) in DataLoader are in the private scope now. Is it a good idea to make it be the protected scoped to provide more options for developers in case they want to customise their DataLoader?

Provide a way to have batch loaders that return Map

Hello,

I have been using java-dataloader for some time now and I found it very useful. Here is one small feedback:

I find the current BatchLoader functional interface is often inconvenient:

CompletionStage<List<V>> load(List<K> keys)

The input is a list of keys and the output is a list that has to match 1 to 1 with the list of keys. In real world data fetching scenarios, it's rare to do a query that returns 1 to 1 matches. For example, let's assume that I want to load Users from a database, I could probably use a query that looks like this:

SELECT * FROM User WHERE id IN (keys)

This kind of queries are very common when using batch loaders, however, this won't return a 1 to 1 match. If one of the users does not exist, the result will not contain it. This means that in order to use this query in a BatchLoader, I have to first create a map, and then return a List of that crossed with the original keys. For example:

List<User> users = database.query("SELECT * FROM User WHERE id IN keys");
Map<Integer, User> userByKey = users.stream().collect(toMap(User::getId, Function.identity()));
List<User> result = keys.stream().map(userByKey::get).collect(toList());
return result;

I think all this bolier plate code can be reduced if an alternative BatchLoader is provided that returns a Map instead:

CompletionStage<Map<K, V>> load(List<K> keys)

What do you think?

the `synchronized` in DataLoader

I guess the reason of usage synchronized is ensure the consistency of operation on loaderQueue,StatisticsCollector, CacheMap.

Remove the synchronized and make the concrete implementation of loaderQueue,StatisticsCollector, CacheMap to ensure the consistency, will be more flexible and can afford higher throughput.

May be a demo about java-dataloader integrate graphql-java is needed?

Feature Request - Pass DataFetchingEnvironment to the BatchLoader.

The dataloader library seems to give me most of the functionality I'm looking for when trying to batch my DB queries for a layer of a graphql query. I am working with very large documents in the database I'm querying, and I'd rather optimize my queries to the database so that I only request the information from the database necessary to fulfill the graphql query.

I'm able to determine the required set of data based on the DataFetchingEnvironment that's passed to the DataFetcher, but I see no way to pass that information downstream to the BatchLoader. Having that object, or any other custom object containing the contextual data I need for that layer of the query, would enable the optimization I'm looking for.

The List of CompletionStage args wont accept a list of CompletableFuture

This wont compile

    PromisedValues<String> pvList = PromisedValues.allOf(Collections.singletonList(new CompletableFuture<String>()));

We should use more generic shape

Dataloader return as async Fetcher

My dataFetcher have some business logic, so need async.
How to use dataloader return some base data.
eg:
BuildWiring is :

RuntimeWiring.newRuntimeWiring()
                .type(newTypeWiring("Query")
                        .dataFetcher("bookById", AsyncDataFetcher.async(graphQLDataFetchers.getBookByIdDataFetcher(), pool)))
                .type(newTypeWiring("Book")
                        .dataFetcher("author", async(graphQLDataFetchers.getAuthorDataFetcher(), pool)))
                .build();

Fetcher.java is:

...
 DataLoader<Object, Object> characters = dataFetchingEnvironment.getDataLoader("books");
CompletableFuture<Object> book = characters.load(bookId);
return book;

The root cause is the result is CompletableFuture<CompletableFuture < Object > >, but graphql only parsing a .get() method;

Add DataLoader constructor for MappedBatchLoader

Right now the only way to create a new DataLoader backed by MappedBatchLoader is through the static methods.

This prevents extending DataLoader when I want to use a Map and not a List.

Is it possible to add another (2, one with options and one without) for MappedBatchLoader?

Class Cast Exception when attempting to utilize Try<T>

Hi there,

We've run into an issue whereby, we receive a ClassCastException when attempting to use any fluent methods of CompletableFuture.

For example:

// Given a data loader like so:
MyDataLoader<String, Try<MyType>> myDataLoader = ...;

// The following code will throw a ClassCastException because we unwrap the Try<MyType> here
// https://github.com/graphql-java/java-dataloader/blob/master/src/main/java/org/dataloader/DataLoader.java#L323-L332
return myDataLoader(myId)
    .thenApply((Try<MyType> myPossibleValue) -> doSomethingWithValue(myPossibleValue));

The invocation of the thenApply method fails because there is a ClassCastException: unable to cast MyType to Try<MyType>.

Change DataLoaderRegistry so that its named

Instead of a list of DLs it would be named DLs. This allows call sites to use the registry for gain access to a named DL more easily.

You pass in the DataLoaderRegistry as the place to access DL instances say

Vertx and Dataloader

When graphql is executed inside vertx we have the option to batch load with a callback function that will run inside vertx event-loop and hence mimic how the original data loader implementation works

The reason i would like to implement such functionality is not the performance of data loader as such but instead having the option to wrap data-loader inside a CompletableFuture.

This is very useful in cases you would like to rewrite all the data-fetchers (using transformation) and implement a rate limiter that will first do an async function to check if you are rate limited and then delegate to the original data-fetcher that might execute a dataloader.load

The issue i have faced with developing such implementation is the DataLoader class. Since its not an interface it is quite difficult to offer an alternative implementation

What do you think should be the way forward in order to make the DataLoader an interface with the least minimum breaking changes?

Java DataLoader: Mismatches results

I am using boot-graphql-kick-start in my project(I used Netflix DGS. Result is the same). When I used Dataloader, the result is missmatching. where am I wrong? Can you help me?

Query:

Result:

Entity: I have ManyToOne Relation With My RoleEntity

DataLoader Method: I'm Using BatchLoader for n+1.

Hibernate Query: In dataLoader Iam wathcing the result. Is seems correctly

Version 1.0.2 if not published to Maven

Could you publish it, please?

Usage without GraphQL

Hello,

I want to use this library as a layer to optimize loading the graph of a good ol' REST service. We don't have a GraphQL API (yet), however the parallel provisioning from DataLoader seems useful as it also provides the batching and per-request caching. However I can't really find a guideline how to call a chain of nested dataloaders as if it is instrumented by a GraphQL library to compose a response with nested (multiple resolving) objects.

Would it be possible to extend the examples with this use-case?

Gradle wrapper : switch to https

Hello,

I encountered a 403 error when building java-dataloader on a fresh system. It's due to the http URL in the gradle/wrapper/gradle-wrapper.properties file.
The http://services.gradle.org/distributions/gradle-4.0-all.zip now returns a 403 error.

To solve it I changed :
distributionUrl=http://services.gradle.org/distributions/gradle-4.0-all.zip

to :
distributionUrl=https://services.gradle.org/distributions/gradle-4.0-all.zip
(https).

Etienne

Data loader with try integration

The documentation mentions an approach that allows to add error information to single items using Try as a result structure.

https://github.com/graphql-java/java-dataloader#error-object-is-not-a-thing-in-a-type-safe-java-world

I would like to use this but I am unable to propagate these errors higher up into the final graphql response. Could you by any chance provide an end-to-end example or give me a hint on how to achieve this?

DataLoader dispatches together keys from different requests

At when using what I think is a standard setup (using graphql-spring-boot with DataLoaderDispatcherInstrumentation and DataLoaderRegistry singleton beans) when two (http) requests from different callers request the same data type by the same key (i.e. use the same DataLoader) all keys are enqueued and dispatched together: BatchLoader.load(List<K> keys) is called with keys merged from both request.
I have not used the facebook node implementation but from what I understand, their DataLoaders are created per-request, so this merging doesn't happen.
While this behavior may be desirable in some cases it comes with some drawbacks:

issues with keys on one request affect the other request and this not very deterministic (unless you backing service is smart enough to return per-key errors)
if one request loads 1 key and another one loads 1K keys, both will have the latency of loading 1001 requests, and again, this is not very deterministic.
if you are propagating authentication and your backing service only takes a global authentication principal (ie: an authorization header) you cannot send the requests together anyway, you need to split by requestor (or execution id) (you could live with this if you backing service took in a per-key principal but that would be pretty ugly i think)

i wonder:

is this behavior intentional?
is this a problem with the way I have it set up?
would you be open for a PR that enables devs choose to merge or not to merge keys?

if this is an issue with my setup then you can skip the rest, otherwise:

these are the options i'm considering at the moment:

wrapping the BatchLoader.load(...) method with one that splits by execution id, this solves some interference issues but it still makes all concurrent requests wait until everyone else's data is available.
subclassing DataLoader to implement something like sliceIntoBatchesOfBatches but doing it by execution id. this could work but it has two issues:
- most of the things i would need to change in the DataLoader class are private so it would involve either copying code or gaining access by reflection :S
- this is fine for the BatchLoader.dispatch() method because it doesn't wait for the overall result, but the dispatchAndJoin() would still wait for every request to finish. i don't mind because I don't use it and the instrumentation only ends up calling dispatch()
- while this approach won't make callers wait, it would still sometimes dispatch "early" some keys of other requests maybe even before they are completely enqueued, resulting occasionally in more requests in a non-deterministic way)
another option i considered is to make the DataLoader a per-request object to make DataLoaders entirely isolated, this isn't easy though, I would need to provide means for DataFetcher to access the right DataLoader for given request, with some effort, I could keep a map by execution id but is not easy to manage it's life-cycle (I fear i would end up with leaked instances).

this is what i would like:
option 1

DataLoader.dispatch() and DataLoader.dispatchAndJoin() and DataLoaderRegistry.dispatchAll() should take an executionId as a parameter. Depending on a data loader option either all requests are dispatched or only requests for that execution id are dispatched. The DataLoader.load(K key) method would also need to take in an execution id (or a DataFetchingEnvironment)
DataLoaderDispatcherInstrumentation.dispatch() passes the execution id to DataLoaderRegistry.dispatchAll()
DataLoaderDispatcherInstrumentation.beginExecution(instrumentationParameters).onEnd(...) calls a new method DataLoaderRegistry.discardAll(ExecutionId) (that calls a new DataLoader.discard(ExecutionId) method) to make sure appropriate clean is on in case of errors/abortion.
would that enough cleanup or is there any case in which keys may have been queued but beginExecution.onEnd is not called?

option 2
similarly but without changing the DataLoader make DataLoaderRegistry be aware of executions and keep a map of executionid -> DataLoaders (it would need to be built with DataLoader suppliers instead of DataLoaders directly (with this apporach only the DataLoaderRegistry.dispatchAll()` method needs to be modified to take in the execution id. in this case the DataLoaderRegistry would need to expose a means to retrieve the DataLoader for a specific execution for DataFetchers to use.

option 3
same thing but managed by the instrumentation, changing the DataLoaderDispatcherInstrumentation to take DataLoaderRegistry supplier instead of a DataLoaderRegistry this supplier or the instrumentation would to expose a method to return the DataLoaderRegistry associated with an execution id so that DataFetchers can get the right one.

Version 3.0.0 and later is no OSGi bundle anymore

Describe the bug
Since version 3.0.0 the java-dataloader is no valid OSGi bundle anymore.
The MANIFEST.MF file does not contain the required entries anymore.

Version 2.2.3 was a valid bundle and was running fine in an OSGi runtime.

To Reproduce
Load java-dataloader in an OSGi runtime, like Eclipse or Apache Felix.

Expected
Valid OSGi bundle with valid import/export.

old MANIFEST.MF:

Manifest-Version: 1.0
Export-Package: org.dataloader;version="2.2.3";uses:="org.dataloader.s
 tats",org.dataloader.impl;version="2.2.3";uses:="org.dataloader",org.
 dataloader.stats;version="2.2.3"
Bundle-SymbolicName: org.dataloader.java-dataloader
Bundle-Version: 2.2.3
Bundle-Name: java-dataloader
Bundle-ManifestVersion: 2
Bnd-LastModified: 1566716677000
Import-Package: org.dataloader;version="[2.2,3)",org.dataloader.impl;v
 ersion="[2.2,3)",org.dataloader.stats;version="[2.2,3)"
Require-Capability: osgi.ee;filter:="(&(osgi.ee=JavaSE)(version=1.8))"
Created-By: 1.8.0_212 (AdoptOpenJDK)
Tool: Bnd-3.2.0.201605172007

new MANIFEST.MF:

Manifest-Version: 1.0
Automatic-Module-Name: com.graphql-java

Won't compile on Windows (encoding issue)

Hello,

I get compilation error when compiling java_dataloader from the windows command prompt.

To solve this, I added this line in the gradle.properties file :
org.gradle.jvmargs=-Dfile.encoding=UTF-8

Etienne

Dataloader issue when batching enabled and caching disabled.

Hi,
We are using dataloader for around 6 months now and recently during a performance test came across a behaviour. My dataloader had batching enabled and caching disabled.

public DataFetcher getUserData() {
    return environment -> {
….
….
        return dataLoader.load(userid)
};
}


private final BatchLoader<String, Object> batchLoader = userIds-> {
        List<Object> userObjects = new ArrayList<>();
    // Get users in parallel over Http via supplyAsync
    for (String id : userids) {
        userObjects.add(GetOverHttp(id));
 }
return PromisedValues.allOf(userObjects).toCompletableFuture();
}

public CompletableFuture<Object> GetOverHttp(String userId) {
    return CompletableFuture.supplyAsync(() -> {
    //calls Http to get users
}
}

A particular case when 2 requests came in simultaneously and having same value of the key for dataloader.load(key). Due to batching only one call was done for GetOverHttp(String userId) and 1 request was served successfully. But the 2nd request/thread was waiting infinitely. This is the thread dump of the 2nd waiting thread.

http-nio-8080-exec-25 - priority:5 - threadId:0x00007f720402c000 - nativeId:0x1aec - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000078d625418> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
at graphql.execution.ExecutorServiceExecutionStrategy.execute(ExecutorServiceExecutionStrategy.java:82)
at graphql.execution.Execution.executeOperation(Execution.java:154)
at graphql.execution.Execution.execute(Execution.java:98)
at graphql.GraphQL.execute(GraphQL.java:546)
at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:488)
at graphql.GraphQL.executeAsync(GraphQL.java:463)
at graphql.GraphQL.execute(GraphQL.java:394)

I think in a scenario where my batching is enabled and caching is disabled, it is perfect that unique "key" are selected by the batch loader but the other requests should also be completed after getting responses for the unique ones.

How to wrap Caffeine with CacheMap?

The docs for Custom caches say:

You could choose to use one of the fancy cache implementations from Guava or Kaffeine and wrap it in a CacheMap wrapper ready for data loader. They can do fancy things like time eviction and efficient LRU caching.

I've been looking at Caffeine, and started writing a CacheMap wrapper for it, but I'm wondering about a couple of things. Does anyone have experience with using a Caffeine cache with their dataloader(s)?

Here's what I've got so far:

public final class CaffeineCacheMap<U, V> implements CacheMap<U, V> {
  private final Cache<U, V> cache = Caffeine.newBuilder().maximumSize(1000).build();
  
  @Override
  public boolean containsKey(U key) {
      return cache.getIfPresent(key) != null;
  }

  @Override
  public V get(U key) {
      return cache.getIfPresent(key);
  }

  @Override
  public CacheMap<U, V> set(U key, V value) {
      cache.put(key, value);
      return this;
  }

  @Override
  public CacheMap<U, V> delete(U key) {
      cache.invalidate(key);
      return this;
  }

  @Override
  public CacheMap<U, V> clear() {
      cache.invalidateAll();
      return this;
  }
}

and here's how I'm using this to build my DataLoaderOptions:

final CacheMap cacheMap = new CaffeineCacheMap<>();

final DataLoaderOptions options = DataLoaderOptions.newOptions().setCacheMap(cacheMap);

My questions so far are:

Is this right?

  public boolean containsKey(U key) {
      return cache.getIfPresent(key) != null;
  }

Caffeine's Cache doesn't provide a "contains key" method, but what about fields with nullable values? Does this mean they can't be cached using the Caffeine cache, since there's no way to distinguish between a cached null value and a cache miss?

CacheMap has type parameters U and V, which I've added to my CaffeineCacheMap implementation as well. But I'm not sure what values to use for these;DataLoader explicitly casts the cacheMap to a CacheMap<Object, CompletableFuture<V>>:

java-dataloader/src/main/java/org/dataloader/DataLoader.java

Line 344 in 1ab3210

return loaderOptions.cacheMap().isPresent() ? (CacheMap<Object, CompletableFuture<V>>) loaderOptions.cacheMap().get() : CacheMap.simpleMap();

Should I do the same, where V comes from my batch loader's value type?

invalid automatic module name when used with Java modules

Describe the bug
When building Selenium with latest we are getting an invalid Automatic-Module-Name: com.graphql-java in the manifest.

There are two problems with that:

The project is dataloader, not graphql itself.
Module names need to be valid package names, and a dash character isn’t allowed

The better name is probably something like com.graphql.dataloader or similar

To get around this you just need to update gradle config to do what the Gradle docs suggests

To Reproduce
Update https://github.com/SeleniumHQ/selenium/blob/e92b16f0832da62204bda5d01fbc430ec9401deb/java/maven_deps.bzl to latest.

And then run bazel build grid in a terminal

graphql-java / java-dataloader Goto Github PK

java-dataloader's Introduction

GraphQL Java

Documentation

Code of Conduct

License

Supported by

java-dataloader's People

Contributors

Stargazers

Watchers

Forkers

java-dataloader's Issues

1. Error handling

2. Field selection

Recommend Projects

Recommend Topics

Recommend Org