Giter VIP home page Giter VIP logo

Comments (15)

eugene1g avatar eugene1g commented on May 5, 2024 2

I have a dozen app-level dataloaders that are shared among all requests. Each of those loaders is linked to a relatively small source, and will never cache more than ~5,000 records. I think the only issue with having a long-living dataloader is the risk of an unbound cache - as you load more data, it chews up more memory, potentially infinitely so. However, if your dataset is inherently limited (e.g. a list of all companies listed on Nasdaq) then your dataloader is naturally capped by your domain size.

from dataloader.

luckydrq avatar luckydrq commented on May 5, 2024

@ashah888 I don't see any advantage of sharing a DataLoader instance at app level but without cache enabled. Maybe i've missed something and could you please explain more? thanks!

@eugene1g i have the similar idea as yours that each data model(entity) has a DataLoader instance at app level. But if a data model can have multiple query fields, there would be many loaders, e.g. idLoader, nameLoader ..., and you have to keep the data in sync among these loaders. So i am still looking for examples or stuff like best practices. As the memory concern you mentioned, i think it's not a problem because DataLoader constructor supports cache parameter, and we can pass a object such as lru-cache for memory management. I'd like to hear your opinions, thanks!

from dataloader.

eugene1g avatar eugene1g commented on May 5, 2024

@luckydrq I haven't heard of lru-cache before, and it seems like a nifty way to cap memory usage for a single dataloader. Using it means we can set the memory cap for each loader independently. In regards to having multiple dataloaders for the same data (e.g. idLoader/nameLoader) - I have a only single dataloader responsible for fetching/building the object based on the primary key (eg id/slug/username). Then I have several mappers that will normalize other inputs into the main PK for the object. For example -

// I don't really do this
const user1 = await idLoader.load(1)
const user2 = await usernameLoader.load('john123')

// but rather this
const user1 = await userLoader.load(1)
const user2Pk = await mapToUserId({ username: 'john123'}) // or {email: '[email protected]'}
const user2 = await userLoader.load(user2Pk)

// which also backs this API
const searchResults = await matchUserIds({country: 'SG', gender: 'F'})
const matchingUsers = await userLoader.loadMany(searchResults)

So whether I end up with a user object by id or username or as part of a search query, it's always the same identical object (in the === sense). This way I don't have to sync data between multiple dataloaders, and only need to optimise my mapping/searching helpers for low latency. For example, the simple username->id helper keeps its own cache of {[username]: id} so the conversion incurs essentially no cost.

from dataloader.

luckydrq avatar luckydrq commented on May 5, 2024

@eugene1g thanks for reply! I have a general idea about the mappers you mentioned, but how does the mapper resolve other key to PK? That is , when you do:

const user1 = await userLoader.load(1)
const user2Pk = await mapToUserId({ username: 'john123'})

What happens in mapToUserId? Does it have to fetch database again? I'm a little confused in here.

from dataloader.

eugene1g avatar eugene1g commented on May 5, 2024

Does it have to fetch database again?
Pretty much. Or whatever the persistence layer is. This is a contrived example, but conceptually it could look something like this -

let lookupCache = {}
const mapToUserId = async ( username:string ) : Promise<number|null> => {
  if( !lookupCache[username] ) {
    const dbResult = await db.fetchRow('select id from person where username = ?', username)
    if( dbResult ) lookupCache[username] = parseInt(dbResult.id)
    // alternatively: lookup via Redis
  }
  return lookupCache[username] || null
}

If you squint hard enough, mapToUserId looks like a Dataloader itself.

So for performance reasons, I have to think about 3 scenarios:

  1. No cache. One query in mapToUserId and one query in userLoader.
  2. Partial cache. Either mapToUserId or userLoader has been envoked for a given entity, so there is only 1 query.
  3. Full cache hit. Almost no cost to run both the mapper and the Dataloader, but I can use either one elsewhere as an independent building block.

An alternative implementation could be to make your Dataloader smart enough to know about all the possible primary keys in your model. So if we continue with this simple example with a user_id/username, the loader could look like this -

const fetchUsers = async (keys: <Array<string|id>>) : Promise<Array<UserType>> => {
  // assume that any PK passed as a string is a username
  const usernames = keys.filter( (key: any) : bool => (typeof key === 'string')) 
  // assume that any PK passed as a number is an id
  const ids = keys.filter( (key:any) : bool => (typeof key === 'number')) 

  // find rows matching the predicates
  const allRows = await pg.many('select * from user where username in ({usernames:csv}) or id in ({$ids:csv}', { usernames, ids} )

  //start building a cache for each requested key
  let byUsername = {}
  let byId = {}

  //register each result against the type of PK used to find it
  for( const row of allRows ){
    if(usernames.includes(row.username)) byUsername[row.username] = row
    if(ids.includes(row.id)) byId[row.id] = row 
  }

  // return all records in the order that was requested in the original list of keys
  const orderedResults = keys.map( (key:any) : UserType|null => (byUsername[key] || byId[key] || null))
  return orderedResults
}

const userLoader = new DataLoader(fetchUsers)
// then free to use a singe dataloader like so
const users = await userLoader.loadMany(['john',12,null])

Though the 'complex dataloader' could get fragile as you'd have add smart filters to interpret which PK the end-user actually wanted (relying on typeof could be insufficient).

from dataloader.

luckydrq avatar luckydrq commented on May 5, 2024

I got your point. To be honest, i am still not very satisfied with this approach as it is complex and hard to maintain if the types of user land key grow fast. I'd think about it later and like to share with you in the future.

Now i have another question: how do you deal with the situation that keys-values are not matched but want to use DataLoader?
Let's say there are three Posts in database:

// Posts
[{
  id: 1,
  author: 'john',
  content: '...'
}, {
  id: 2,
  author: 'john',
  content: '...'
}, {
  id: 3,
  author: 'alex',
  content: '...'
}]

The nameLoader would throw error if just call nameLoader.load('john') because there are two records for john. I'd like to hear your advice as i think this is also quite common.

from dataloader.

eugene1g avatar eugene1g commented on May 5, 2024

Definitely, I don't like the idea of complicated dataloaders either, and that's why I ended up with several mappers to find a single primary key for a specified user request, then feed that PK to the Dataloader.

how to deal with the situation that keys-values are not matched

Can you please rephrase this point? Dataloaders rely on primary keys, so those would never clash (and that's enforced with unique values on the database level as well).

Perhaps the question is how to user Dataloader to fetch specific posts. So in my app I'd have this -

const findIdsMatching = async (fieldValues): Array<number> => {
   // run a query to find objects IDs in redis/lucene/database
  const ids = await pg.query('select id from posts where ....') 
  return ids
}

// do a quick/cheap search
const matchingIds = await findIdsMatching({ author: 'john'})
// now load all matching objects which might be costly to populate
const posts = await postLoader.findMany(matchingIds)

from dataloader.

luckydrq avatar luckydrq commented on May 5, 2024

Can you please rephrase this point?

My bad, the words are ambiguous. Yes, a none-primary(unique) key is what i mean. I understand the principle that DataLoader follows, but i want to use DataLoader as a complete solution for cache strategy that reduces network IO. When I say complete i mean more scenarios coverage, including what i mentioned above. So i am looking for a solution that built on top of DataLoader.

from dataloader.

luckydrq avatar luckydrq commented on May 5, 2024

If i have to do idLoader.load(id) and db.posts.select({ name: 'john' }) to cover different queries, it feels not quite elegant and also inefficient because db.posts.select has no cache at all. Not sure if my point is clear. Thanks!

from dataloader.

dcworldwide avatar dcworldwide commented on May 5, 2024

I'm also trying to understand the best design to handle loading records from a table using different query fields with data loader.

Seems this conversation was left hanging. Has anyone made progress on this?

I don't really like the mapping idea because it's an extra request. But I understand it's simplicity is attractive.

from dataloader.

dcworldwide avatar dcworldwide commented on May 5, 2024

Fo those that want batching but not caching, Why can't find() or findMany() take multiple query field args instead of just a pk? Could that be done as is?

from dataloader.

luckydrq avatar luckydrq commented on May 5, 2024

@dcworldwide now i realize that DataLoader is best used as per-request cache. I think this thread has made the point nice and clear. So maybe our expectations for what DataLoader can do is over exceeded. Now i will have idLoader(query by id field which is an primary key) and other loaders such as nameLoader(query by name field which is an unique key).

from dataloader.

dcworldwide avatar dcworldwide commented on May 5, 2024

@luckydrq thanks, I've settled on the same design too.

from dataloader.

leebyron avatar leebyron commented on May 5, 2024

I believe this is safe only in very careful circumstances, but I wouldn't recommend the pattern.

Specifically it is safe if:

  • performing an access-control check after loading the data to ensure that two users with different access-control loading the same value does not break access-control rules.

  • the results do not contain any user-specific data and are never mutating the loaded results with any user-specific data that could accidentally be shared across requests and create a privacy issue.

In these cases, it would be safe to have application-level caches instead of request-level caches, however then you might consider keeping the cache to improve performance, just being cautious of memory growth, perhaps with an LRU cache.

Though in my personal opinion, the cost of a mistake is too high with this design and per-request instances of DataLoader dramatically reduces the cost of mistake by limiting shared information between requests.

from dataloader.

luckydrq avatar luckydrq commented on May 5, 2024

@dcworldwide I've created this module trying to help improve convenience and performance to deal with relational database. Hope it will help :)

from dataloader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.