vektah / dataloaden Goto Github PK

View Code? Open in Web Editor NEW

524.0 524.0 79.0 59 KB

go generate based DataLoader

License: MIT License

Go 100.00%

dataloaden's People

Contributors

Stargazers

Watchers

dataloaden's Issues

is this project still alive

go run github.com/vektah/dataloaden UserLoader int ./getting_started/gqlgen-todos/graph/model.User
unable to gofmt: /Users/Luca.Paterlini/Documents/Save24thJuly2020/STUDYFUN/go_study/utils/gqlgen/dataloader/userloader_gen.go:6:1: expected 'IDENT', found 'import'
exit status 2

Problem with generating dataloader using struct as key type

Im trying to generate dataloader that uses struct type as a key instead of primitive type.
Im using gqlgen and all my models are generated automatically. My project structure:

internal/
  graphql/
    generated/
      executor_gen.go
      models_gen.go
      ...
    models/
      loaderkey.go

Contents of loaderkey.go:

package models

type LoaderKey struct {
	ID     int
	Fields []string
}

Command to generate dataloader:

cd internal/graphql/generated
go run github.com/vektah/dataloaden \
  AuthorLoader \
  *my.site/project/internal/graphql/models.LoaderKey \
  *my.site/project/internal/graphql/generated.Author

Dataloaded succesfully generates authorloader_gen.go dataloader. But there is a problem:

// Code generated by github.com/vektah/dataloaden, DO NOT EDIT.

package generated

import (
	"sync"
	"time"
)

// AuthorLoaderConfig captures the config to create a new AuthorLoader
type AuthorLoaderConfig struct {
	// Fetch is a method that provides the data for the loader
	Fetch func(keys []*LoaderKey) ([]*Author, []error)

	// Wait is how long wait before sending a batch
	Wait time.Duration

	// MaxBatch will limit the maximum number of keys to send in one batch, 0 = not limit
	MaxBatch int
}

<...>

As you can see fetch func using my LoaderKey type, but package is not imported. How to fix this?

Possible data race when use loader.Clear(key).Prime(key, value) to force prime

Lock is released between Clear and Prime
If other goroutine do Load or Prime during this period, the value will be not used.

Panics inside fetch are not recovered

Panics that happen in the fetch function are not recovered, which means that a panic takes down the entire process. This could be something that I handle myself in every fetch function, but it seems like something that the generated code should handle.

I can think of a few approaches:

Always recover from panics in the template.
Allow specifying whether or not to recover from panics at code generation time.
Allow specifying a panic handler function in the loader's config object.

I think I'd lean toward the first, but thought I'd start a discussion. I'm happy to open a pull request once we decide on an approach.

How could one implement pagination using this library

Hi,
Thanks for the good work.

I was wondering, how would one proceed to have pagination with dataloaden (or dataloader more globally) ?

For instance with ordersByCustomer, we currently have :

// 1:M loader
ldrs.ordersByCustomer = &OrderSliceLoader{
	wait:     wait,
	maxBatch: 100,
	fetch: func(keys []int) ([][]Order, []error) {
		var keySql []string
		for _, key := range keys {
			keySql = append(keySql, strconv.Itoa(key))
		}

		fmt.Printf("SELECT * FROM orders WHERE customer_id IN (%s)\n", strings.Join(keySql, ","))
		time.Sleep(5 * time.Millisecond)

		// ...
		
		return orders, errors
	},
}

Could ordersByCustomer become something like this :

// 1:M loader
ldrs.ordersByCustomer = &OrderSliceLoader{
	wait:     wait,
	maxBatch: 100,
	fetch: func(keys []int, last int, from Date) ([][]Order, []error) { // <-- HERE
		var keySql []string
		for _, key := range keys {
			keySql = append(keySql, strconv.Itoa(key))
		}

		fmt.Printf("SELECT * FROM orders WHERE customer_id IN (%s) AND WHERE created_at > (%s) LIMIT (%d)", strings.Join(keySql, ","), from, last) // <-- HERE
		time.Sleep(5 * time.Millisecond)

		// ...
		
		return orders, errors
	},
}

Or do I need to hack something using the example with context ?

Regards,
David.

Does the dataloader need to be constructed for every request?

I noticed that a lot of examples seem to construct new instances in the handlers. I would prefer to create my dataloaders while initialising the server.

This way I can avoid using context.

Can this be done? Can I embed a single instance of the dataloaders into the resolver struct to be reused across requests or does it need to be created every time? Will it keep track of what requests requested which IDs?

Ability to change the name of the generated loader

It would be nice if you could a flag to change the name of the generated loader.

e.g. dataloaden -keys string github.com/somestuff/user.Configuration -name UserConfigurationLoader

File not generated: unable to gofmt

I have simple structs:

package models

type Article struct {
	ID      int       `json:"id"`
	Title   string    `json:"title"`
	Authors []*Author `json:"authors"`
}

type Author struct {
	ID       int    `json:"id"`
	FullName string `json:"fullName"`
}

I get an error when trying to generate dataloaders:

go run github.com/vektah/dataloaden AuthorLoader int *git.rn/gm/service-articles/internal/models.Author

unable to gofmt: /home/pioneer/sources/gm/service-articles-go/internal/dataloaders/authorloader_gen.go:6:1: expected 'IDENT', found 'import'
exit status 2

What I'm doing wrong?

Pass context to fetch function

package generator

import "text/template"

var tpl = template.Must(template.New("generated").
	Funcs(template.FuncMap{
		"lcFirst": lcFirst,
	}).
	Parse(`
// Code generated by github.com/vektah/dataloaden, DO NOT EDIT.

package {{.Package}}

import (
    "context"
    "sync"
    "time"

    {{if .KeyType.ImportPath}}"{{.KeyType.ImportPath}}"{{end}}
    {{if .ValType.ImportPath}}"{{.ValType.ImportPath}}"{{end}}
)

// {{.Name}}Config captures the config to create a new {{.Name}}
type {{.Name}}Config struct {
	// Fetch is a method that provides the data for the loader 
	Fetch func(ctx context.Context, keys []{{.KeyType.String}}) ([]{{.ValType.String}}, []error)

	// Wait is how long wait before sending a batch
	Wait time.Duration

	// MaxBatch will limit the maximum number of keys to send in one batch, 0 = not limit
	MaxBatch int
}

// New{{.Name}} creates a new {{.Name}} given a fetch, wait, and maxBatch
func New{{.Name}}(config {{.Name}}Config) *{{.Name}} {
	return &{{.Name}}{
		fetch: config.Fetch,
		wait: config.Wait,
		maxBatch: config.MaxBatch,
	}
}

// {{.Name}} batches and caches requests          
type {{.Name}} struct {
	// this method provides the data for the loader
	fetch func(ctx context.Context, keys []{{.KeyType.String}}) ([]{{.ValType.String}}, []error)

	// how long to done before sending a batch
	wait time.Duration

	// this will limit the maximum number of keys to send in one batch, 0 = no limit
	maxBatch int

	// INTERNAL

	// lazily created cache
	cache map[{{.KeyType.String}}]{{.ValType.String}}

	// the current batch. keys will continue to be collected until timeout is hit,
	// then everything will be sent to the fetch method and out to the listeners
	batch *{{.Name|lcFirst}}Batch

	// mutex to prevent races
	mu sync.Mutex
}

type {{.Name|lcFirst}}Batch struct {
	keys    []{{.KeyType}}
	data    []{{.ValType.String}}
	error   []error
	closing bool
	done    chan struct{}
}

// Load a {{.ValType.Name}} by key, batching and caching will be applied automatically
func (l *{{.Name}}) Load(ctx context.Context, key {{.KeyType.String}}) ({{.ValType.String}}, error) {
	return l.LoadThunk(ctx, key)()
}

// LoadThunk returns a function that when called will block waiting for a {{.ValType.Name}}.
// This method should be used if you want one goroutine to make requests to many
// different data loaders without blocking until the thunk is called.
func (l *{{.Name}}) LoadThunk(ctx context.Context, key {{.KeyType.String}}) func() ({{.ValType.String}}, error) {
	l.mu.Lock()
	if it, ok := l.cache[key]; ok {
		l.mu.Unlock()
		return func() ({{.ValType.String}}, error) {
			return it, nil
		}
	}
	if l.batch == nil {
		l.batch = &{{.Name|lcFirst}}Batch{done: make(chan struct{})}
	}
	batch := l.batch
	pos := batch.keyIndex(ctx, l, key)
	l.mu.Unlock()

	return func() ({{.ValType.String}}, error) {
		<-batch.done

		var data {{.ValType.String}}
		if pos < len(batch.data) {
			data = batch.data[pos]
		}

		var err error
		// its convenient to be able to return a single error for everything
		if len(batch.error) == 1 {
			err = batch.error[0]
		} else if batch.error != nil {
			err = batch.error[pos]
		}

		if err == nil {
			l.mu.Lock()
			l.unsafeSet(key, data)
			l.mu.Unlock()
		}

		return data, err
	}
}

// LoadAll fetches many keys at once. It will be broken into appropriate sized
// sub batches depending on how the loader is configured
func (l *{{.Name}}) LoadAll(ctx context.Context, keys []{{.KeyType}}) ([]{{.ValType.String}}, []error) {
	results := make([]func() ({{.ValType.String}}, error), len(keys))

	for i, key := range keys {
		results[i] = l.LoadThunk(ctx, key)
	}

	{{.ValType.Name|lcFirst}}s := make([]{{.ValType.String}}, len(keys))
	errors := make([]error, len(keys))
	for i, thunk := range results {
		{{.ValType.Name|lcFirst}}s[i], errors[i] = thunk()
	}
	return {{.ValType.Name|lcFirst}}s, errors
}

// LoadAllThunk returns a function that when called will block waiting for a {{.ValType.Name}}s.
// This method should be used if you want one goroutine to make requests to many
// different data loaders without blocking until the thunk is called.
func (l *{{.Name}}) LoadAllThunk(ctx context.Context, keys []{{.KeyType}}) (func() ([]{{.ValType.String}}, []error)) {
	results := make([]func() ({{.ValType.String}}, error), len(keys))
 	for i, key := range keys {
		results[i] = l.LoadThunk(ctx, key)
	}
	return func() ([]{{.ValType.String}}, []error) {
		{{.ValType.Name|lcFirst}}s := make([]{{.ValType.String}}, len(keys))
		errors := make([]error, len(keys))
		for i, thunk := range results {
			{{.ValType.Name|lcFirst}}s[i], errors[i] = thunk()
		}
		return {{.ValType.Name|lcFirst}}s, errors
	}
}

// Prime the cache with the provided key and value. If the key already exists, no change is made
// and false is returned.
// (To forcefully prime the cache, clear the key first with loader.clear(key).prime(key, value).)
func (l *{{.Name}}) Prime(key {{.KeyType}}, value {{.ValType.String}}) bool {
	l.mu.Lock()
	var found bool
	if _, found = l.cache[key]; !found {
		{{- if .ValType.IsPtr }}
			// make a copy when writing to the cache, its easy to pass a pointer in from a loop var
			// and end up with the whole cache pointing to the same value.
			cpy := *value
			l.unsafeSet(key, &cpy)
		{{- else if .ValType.IsSlice }}
			// make a copy when writing to the cache, its easy to pass a pointer in from a loop var
			// and end up with the whole cache pointing to the same value.
			cpy := make({{.ValType.String}}, len(value))
			copy(cpy, value)
			l.unsafeSet(key, cpy)
		{{- else }}
			l.unsafeSet(key, value)
		{{- end }}
	}
	l.mu.Unlock()
	return !found
}

// Clear the value at key from the cache, if it exists
func (l *{{.Name}}) Clear(key {{.KeyType}}) {
	l.mu.Lock()
	delete(l.cache, key)
	l.mu.Unlock()
}

func (l *{{.Name}}) unsafeSet(key {{.KeyType}}, value {{.ValType.String}}) {
	if l.cache == nil {
		l.cache = map[{{.KeyType}}]{{.ValType.String}}{}
	}
	l.cache[key] = value
}

// keyIndex will return the location of the key in the batch, if its not found
// it will add the key to the batch
func (b *{{.Name|lcFirst}}Batch) keyIndex(ctx context.Context, l *{{.Name}}, key {{.KeyType}}) int {
	for i, existingKey := range b.keys {
		if key == existingKey {
			return i
		}
	}

	pos := len(b.keys)
	b.keys = append(b.keys, key)
	if pos == 0 {
		go b.startTimer(ctx, l)
	}

	if l.maxBatch != 0 && pos >= l.maxBatch-1 {
		if !b.closing {
			b.closing = true
			l.batch = nil
			go b.end(ctx, l)
		}
	}

	return pos
}

func (b *{{.Name|lcFirst}}Batch) startTimer(ctx context.Context, l *{{.Name}}) {
	time.Sleep(l.wait)
	l.mu.Lock()

	// we must have hit a batch limit and are already finalizing this batch
	if b.closing {
		l.mu.Unlock()
		return
	}

	l.batch = nil
	l.mu.Unlock()

	b.end(ctx, l)
}

func (b *{{.Name|lcFirst}}Batch) end(ctx context.Context, l *{{.Name}}) {
	b.data, b.error = l.fetch(ctx, b.keys)
	close(b.done)
}
`))

Any recipes to use dataloaden with Gin?

I'm looking for a pattern or a recipe to use dataloaden with gin is there some approach?

How about cache cleanup on timer

There is feature to cleanup cache on timer or one way to clear is explicitly call Clear func on loader?
If there is one way what you think about add cleaning cache by timer functionality?

Order of Results

Hi,

I wanted to double check that if the order of results returned from the database matter. For example postgres doesn't guarantee the order of results when doing an IN query and looking through the generated code it looks like it expects the results to match the order of the keys it sends.

If it does, any thoughts on what the best practice should be here. Maybe the fetch function should return a map[keys]values instead of a slice?

Thanks!

Pass fields collection from context to Dataloader

What happened?

The target is to pass field collection context GetOperationContext into the Dataloader. In the resolvers we are passing the context into the retriever but it doesn't contain the the info that we need to get the fields of the query on that level. So I tried to change that on the Middleware layer but without hope. Do you know if there is a way to pass that context on the Dataloaders ?
FYI the error happen when I try to pass the field collection of the nested model in Dataloader, panic missing operation context

What did you expect?

When I was passing the context in the resolver I was expecting that I will have all the info to get the fields and use GetOperationContext to get them and pass them in Dataloader.

Minimal graphql.schema and models to reproduce

  rootQuery(
    name: ["Alex"]) {
    name
    email
    tel
    ProductsInfo{
      id
      name
      description
    }
  }
}

Generate dataloader that returns map[string]interface{} does not seems to work

go run github.com/vektah/dataloaden ObjectLoader string map[string]interface{}
unable to gofmt: [hidden]/dataloader/someloader_gen.go:125:2: expected expression (and 8 more errors)
exit status 2

Is this supported? Or is only a slice supported?

Add context in the load function

Hi, I have some cases that I need the context... For example I'm using the library to make external requests to other APIs and each logged in user have their access key (which is in the context) and there's no way I can get it inside the fetch function.

My users are being set using GraphQL directives because some resolvers don't require users to be logged in

Can we add that?

Rolling custom binary

Is there any potential to separating out the main package from the actual package code so that one could create their own custom binary by importing the package and not having to run go get?

Package Internals

"wait" and "maxBatch" are lowercased which doesn't allow to package _gen files together and have the middleware in another package .

Fields that start with lower case characters are package internal and not exposed, to reference the field from another package it needs to start with an upper case character

Architecture using timeouts to call fetch?

I've so far been using graphql in node and have used the facebook dataloader quite a bit. I have some go services that I want to add a graphql api to. Gqlgen seems quite legit and I've read good things about it so far. However the way this dataloader works doesn't feel right. The facebook dataloader in js doesn't have such a weird implementation where you have to wait an arbitrary amount of time to resolve the data. Did you have trouble finding a solution in go that doesn't involve timeouts to know when to resolve the data? I would like to explore this a bit as it's currently preventing me from deciding to use this library. Can you maybe give some info on what you tried and how you decided that timeouts is the best solution? I think it would be good to include this in the readme, as I'm sure I'm not the only one that has concerns about this.

can't generate loader, no matches found

go run github.com/vektah/dataloaden ProjectLoader int *myprivate.domain/backend.api/internal/domain.Project

zsh: no matches found: *myprivate.domain/backend.api/internal/domain.Project

how can this be solved?

Cannot generate dataloader with target type 'time.Time'

What I am trying to do:

Generate a dataloader with target type time.Time. Example given: //go:generate dataloaden fooLoader int *time.Time

What happen:

Generating fails with:

➜  go-test go generate ./...                                                                                                                                                                             16:31:36
validation failed: packages.Load: /home/vanjiii/dev/src/junk/go-test/fooloader_gen.go:9:2: time redeclared in this block
/home/vanjiii/dev/src/junk/go-test/fooloader_gen.go:7:2:        other declaration of time
exit status 1
main.go:10: running "go": exit status 1

The generated file fooloader_gen.go

// Code generated by github.com/vektah/dataloaden, DO NOT EDIT.

package main

import (
	"sync"
	"time"

	"time"
)
// rest of file...

What is expected

The generation to complete.

Workaround

Create a wrapper type:

type Time struct {
    time.Time
}

Generate the dataloader with the newly created type:
//go:generate dataloaden fooLoader int *Time

Can disable caching?

[Question] How to specify first, last, after, before etc with dataloader

For example, if you request with GitHub GraphQL API with the following query, the result will be returned considering first and after.

{
  viewer {
    repositories(first: 30) {
      nodes {
        issues(first: 30, after: "Y3Vyc29yOnYyOpHOAp96sw==") {
          nodes {
            title
          }
        }
      }
    }
  }
}

Probably, if I implement something similar api with gqlgen, I should use dataloader for issues.
However I can only pass keys to dataloader.
How should I pass first, after, etc. information?

Fails to generate files in an empty directory

This is almost the same issue reported in #30.

dataloaden retrieves a package name in the current working directory when generating a file.
But, if no packages are found in the working directory, the generated file has no (empty) package name and is broken.

I got the error below (same as #30) when I tried to generate in a newly created directory.

$ pwd
/path/to/example.com
$ mkdir dataloader; cd dataloader
$ go run github.com/vektah/dataloaden UserLoader uint64 *example.com/model.User
unable to gofmt:/path/to/example.com/dataloader/userloader_gen.go:6:1: expected 'IDENT', found 'import'
exit status 2

tracing hooks

It would be very useful to have (maybe generate-time optional) tracing hooks in the generated code. It's very difficult to reason about data loader performance and tracing is the first step to doing it.

Is this project abandoned?

I see no recent commits. Is it reliable to use this project in production?

No maches found error for slice, pointer, slice of pointer

I upgraded the dataloaden version to latest v0.3.0, it starts to show following error.

$ dataloaden UserSliceLoader int []*github.com/vektah/dataloaden/example.User

zsh: no matches found: []*github.com/vektah/dataloaden/example.User

I don't know this is my environment problem, but I succeeded to generate dataloader for normal struct and other self defined types.

$ dataloaden UserLoader int github.com/vektah/dataloaden/example.User

Now I avoid this error by defining new type of slice, and slice of pointers, like below.

type UserSlice []User
type UserPointerSlice []*User
type UserPointer *User

My environment is MacOS, zsh, go 1.12.5.

Pass args to dataloader

What is the best approach if I need to pass arguments to my dataloader?

I've seen 2 issues on here where people suggest using anonymous structs but that feels pretty janky for a few reasons.

I could also see a solution of just throwing it into context but that also doesn't feel right because we lose type safety.

Is there a better officially supported approach?

We should be able to change the name of the generated file

We should be able to change the name of the generated file.

https://github.com/vektah/dataloaden/blob/master/pkg/generator/generator.go#L87

NFR: flag to do the mapping

So every single data loader usage will have a mapping thing going on inside, it's basically:

query the db
db returns in an order
initialize a map and populate that map based on id
traverse through original input ids,
create a slice in that original order
return that slice

I feel like since this is exactly the same for every single data loader which deals with db, why not add a flag which will include this as well?

Generation removes config

For some members of our team re-generation removes parts that have to do with config, for example:

// MediaLoaderConfig captures the config to create a new MediaLoader
type MediaLoaderConfig struct {
	// Fetch is a method that provides the data for the loader
	Fetch func(keys []uint64) ([]*graphql.Media, []error)

	// Wait is how long wait before sending a batch
	Wait time.Duration

	// MaxBatch will limit the maximum number of keys to send in one batch, 0 = not limit
	MaxBatch int
}

// NewMediaLoader creates a new MediaLoader given a fetch, wait, and maxBatch
func NewMediaLoader(config MediaLoaderConfig) *MediaLoader {
	return &MediaLoader{
		fetch:    config.Fetch,
		wait:     config.Wait,
		maxBatch: config.MaxBatch,
	}
}

Do you have any idea why this might be happening?

How to order slice of pointers using gqlgen and dataloaden if some keys are NULL?

I have these entities:

Golang struct:

type Player struct {
	ID        int
	CreatedAt time.Time
	City      City
	CityID    int
	Team      *Team
	TeamID    *int
	Score     int
}

GraphQL schema:

type Player {
  id: ID!
  createdAt: Time!
  City: City!
  Team: Team
  Score: Int!
}

As you can see:

City is mandatory
Team (and TeamID in Golang struct) is NOT mandatory, so I'm using a pointer type (*)

What I don't understand now is how to use dataloaden (a GraphQL dataloader).

Here the generated code for CityById loader, generated with: go run github.com/vektah/dataloaden CityByIdLoader int *my_project/entities.City:

CityById: CityByIdLoader{
  maxBatch: 100,
  wait:     1 * time.Millisecond,
  fetch: func(keys []int) ([]*entities.City, []error) {
    cities, err := myRepo.CitiesByKeys(keys)
    // directly ordered in DB based on keys` order: it works
    return cities, []error{err}
  },
},

What about []*int for TeamID?

Here's the generated code for TeamById loader, generated with: go run github.com/vektah/dataloaden TeamByIdLoader *int *my_project/entities.Team:

TeamById: TeamByIdLoader{
  maxBatch: 100,
  wait:     1 * time.Millisecond,
  fetch: func(keys []int) ([]*entities.Team, []error) {
    teams, err := myRepo.TeamsByKeys(keys)
    // I cannot order in DB because of `NULL` keys, right?
    // How to order these results?
    return teams, []error{err}
  },
},

I don't understand how to re-order teams by keys if I cannot do it in DB because of NULL values in keys.

Is there a specific reason why this depends on an old version of golang.org/x/tools?

I was just wondering why this depends on the 0.0.0 version of golang.org/x/tools and not something more recent?