usegraffy / graffy Goto Github PK

View Code? Open in Web Editor NEW

45.0 1.0 7.0 4.51 MB

Live queries for graph-shaped data

Home Page: https://graffy.org

License: Apache License 2.0

JavaScript 98.28% HTML 0.11% CSS 1.61%

graphql falcor nodejs data-fetching streaming-api

graffy's Introduction

Graffy

Graffy is a toolkit for graph-centric APIs. It has capabilities comparable to GraphQL and Firebase.

Why?

Graffy supports complex, expressive live queries - with multiple levels of resource expansion and pagination - based on a novel application of set theory and CRDTs.

Client-side example

import Graffy from '@graffy/core';
import GraffyClient from '@graffy/client';

const store = new Graffy();
store.use(new GraffyClient('/api'));

const query = {
  posts: [{ last: 10 }, { // Pagination parameters
    title: true,
    author: { name: true }
  }]
};

for await (const state of store.watch(query)) {
  // Iterates each time relevant data changes on the server.
  console.log(state);
}

Why Graffy?

Graffy provides live queries, which give clients a real-time view of the data they need. Graffy supports complex queries with nested graph traversals and pagination, while exposing a simple and intuitive API for building clients and servers.

Graffy was inspired by (and borrows from) Facebook's GraphQL and Netflix's Falcor. Compared to GraphQL, Graffy offers a more familiar data model, true live queries and more efficient caching. Compared to Falcor, it provides cursor-based pagination and real-time subscriptions.

Unlike GraphQL resolvers and Falcor data providers, Graffy providers can be composed like Express/Koa middleware. This allows authentication, validation, custom caches and resource limiting to be implemented in a straightforward manner.

Graffy providers can also perform efficient bulk reads from underlying data stores (for example by constructing optimized SQL queries). This is particularly hard to do with GraphQL (see dataloader) and Falcor.

Documentation

Modules

The graffy metapackage exports a constructor for a Graffy store in its default configuration.

All the Graffy packages are published under the @graffy scope on NPM.

Module	Description
core	Module management
fill	Fulfil queries from many providers
client	EventStream/HTTP client
server	EventStream/HTTP server
cache	In-memory cache
common	Shared utilities
react	React container and hooks API
stream	Utility for making AsyncIterables
testing	Testing and debugging utilities
graphql	Translate GraphQL to Graffy
schema	⌛ Validation, introspection API
viewer	⌛ Schema introspection client
auth	⌛ Authentication and authorization
limit	⌛ Resource consumption accounting
mysql	⌛ Data source connector
postgres	⌛ Data source connector

⌛ = On the roadmap.

Capabilities

	Graffy	GraphQL	Falcor	Description
Narrow queries	✅	✅	✅	Queries specify required fields; Allows API evolution
Deep queries	✅	✅	✅	Queries can expand nested resources; Reduces round-trips
Live queries	✅	❌	❌	Push changes to query results in real time
Pagination cursors	✅	✅	❌	Enables efficient pagination on the server
Parameters	✅	✅	❌	Custom filtering criteria, etc.
Caching pages	✅	❌	✅	Cache result of paginated queries
Atomic writes	✅	❌	✅	Writes that trigger accurate cache invalidation
Non-data endpoints	✅	✅	❌	Mutations, subscriptions, cross-resource search

graffy's People

Contributors

Stargazers

Watchers

Forkers

sebdeckers email2vimalraj rizkisunaryo ashniu123 bqrkhn nk-nekatr bibhuty-did-this

graffy's Issues

JS.ORG CLEANUP

Hello, it seems a js.org subdomain that was requested to target this repository no longer works.
The subdomain requested was graffy.js.org and had the target of aravindet.github.io/graffy.
It produced the following failures when tested as part of the cleanup:

HTTP: Failed with status code '404 Not Found'
HTTPS: Failed with status code '404 Not Found'

To keep the js.org subdomain you should add a page with reasonable content within a month so the subdomain passes the validation.
Failure to rectify the issues will result in the requested subdomain being removed from JS.ORGs DNS and the list of active subdomains.

If you are wanting to keep the js.org subdomain and have added reasonable content, YOU MUST reply to the main cleanup issue with the response format detailed at the top to keep the requested subdomain.

🤖 Beep boop. I am a robot and performed this action automatically as part of the js.org cleanup process. If you have an issue, please contact the js.org maintainers.

Replace pageInfo with nextRange and prevRange

It's more useful, more succinct, and protects the user from having to deal with \0s and \uffffs

The change should happen in decorate.js

Instead of:

arr.pageInfo = {
  hasNext: false,
  hasPrev: true,
  start: '',
  end: 'foobaq\uffff'
}

we should have:

arr.prevRange = null;
arr.nextRange = { first: 10, after: 'foobar' };

The null prevRange indicates that this is the first page. The first / last should match the current page size, and before / after should use the keyAfter / keyBefore helpers from @graffy/common.

Server should support more REST-like paths

This is a nice-to-have for 1.0.

As a prerequisite, we should add a soft convention for naming indexes, e.g. as '$<index_name>`, then we can do:

/posts?by=time&first=10&fields=slug,title,at,authors(first:1,name,avatar) should become:

{
  'posts$time': [ { first: 10 }, {
    slug: 1, title: 1, at: 1,
    authors: [ { first: 1}, {
      name: 1, avatar: 1
    } ]
  } ]
}

GET /posts/123?fields=slug,title,at,author(name,avatar) should become:

{
  'posts': { 123: {
    slug: 1, title: 1, at: 1,
    author: { name: 1, avatar: 1 }
  } }
}

Make graffy/react work with Suspense

MVCC: "Multi-layer" graphs

Version storage might be simplified by only having one writeVersion and one readVersion per tree, but to pack multiple trees into "layers".

Merging would simply add a new layer, and occasionally perform "vacuuming" to remove data from old layers that has been shadowed by later layers.
Optimistic updates (on client) and two-phase commit (distributed backends) can then be implemented easily by delaying the "vacuuming" until commit occurs.
Slice will need to be done per layer.
setVersion would merge everything to a single layer.
Seive will only work on single-layer graphs, which require calling setVersion first. The sieve mechanism to detect relevant updates is not resilient to out-of-order updates anyway; a query denormalization-based approach should be considered.

To stop this getting out of hand, queries are immutable and can only have one layer - so all parts of a query must have the same min version requirement. When merging queries, we can take the max(readVersion) and min(writeVersion) to ensure that the data required by all constituent queries are requested.

This is inextricably linked to #2 .

Alternatives

This is a significant change from the existing CRDT / LWW model.
An alternate model to MVCC in databases is an undo log. It has some advantages but it is not entirely clear how it might be implemented within the Graffy data model.

Two-argument form of link()

An index provider might be able to retrieve the necessary information at the link, not just its location. Allowing the provider to do, for example:

store.onRead('/posts$', query => {
  const posts = getPostsFromDb(query);
  return _.fromPairs(posts.map(post => [
    key([post.createdAt, post.id]),
    link(`/posts/${post.id}`, post),
  ]));
})

Improve version

Currently, every node has a version value. In practice, in most (but not all) graphs and queries, all nodes have the same version. There is some redundancy here.

In subscription caches, we need to update the version number of the entire cache whenever there is a new update. With the current data structure, this takes O(size of cache) time, while other operations only take O(size of change).

It might be beneficial to rethink how version is stored and manipulated in the internal representation.

Switch to the repeater library

https://repeater.js.org/docs/repeater

This can be a replacement for @graffy/stream (which can then be deprecated) and mergeIterators. mapStream can also be replaced with an async generator.

Filtering links

Create a special "filtering link" which, when traversed, modifies the keys immediately under the link to add filtering parameters.

Illustrative use case

Consider the schema

{
  posts: { [pid]: Post },
  posts$$createdAt: { [filter]: { [createdAt]: link(`/posts/${pid}`) } },
  users: { [uid]: User }
}

where the posts$$createdAt index can be filtered by authorId and tag.

Imagine we want to query last 3 posts of a user, with a particular tag, alongside their name. While this is possible already with a query containing a users branch as well as a posts$$createdAt branch, such a query would be unintuitive, duplicate userIds, and require additional post-processing of results.

Ideally this query should work:

# Query
{
  users: { '123': {
    name: 1,
    posts: { [key({ tag: 'example' })]: [{ last: 3 }, {
      title: 1, createdAt: 1
    }] }
  } }
}

and Graffy should send the following query to the posts$$createdAt provider:

{ [key({ tag: 'example', authorId: '123' })]: {
  title: 1, createdAt: 1
} }

This could be done if the user provider returned a "filtering link" for the posts property:

{
  name: 'Example',
  posts: link(['posts$$createdAt', { authorId: '123' }])
}

Rethink Watch

TL:DR; Replace watch() with incremental read() polling

Why

The current implementation of watch() is complex to implement in providers and doesn't support back-pressure or resumption.

Steps

Implement the new where query version semantics ("if-changed-after")
- Specify that version is a non-negative number, and that version 0 has a special meaning
- Implement the querying of linked paths in graffy-link and drop graffy-fill completely
- Restrict the use of finalize() to queries with version 0 in core, pg and link
Implement query version filtering in slice(): exclude unchanged from both known and unknown
Implement query version filtering in pg by adding a condition on verCol
Implement the async iterator form of read() to perform incremental polling

Clean up extraneous branches in subscription queries

Currently, graffy fill makes extra queries for subscriptions when resolving links. However, it does not clean those up when the link is updated.

This is currently planned to be fixed by extending slice() to return extraneous as well.

Helpful error messages

This watch handler did not yield an initial value within five seconds. If it's a change handler only, please ensure that it yields undefined first.
(more)

APIs on query objects

@baopham Thread to discuss what sort of APIs the query object should have to make it easy for providers that might want to (1) construct a query, like SQL or ES (2) identify topics to subscribe to.

Say you want to write a provider /users that needs to serve both queries like:

// 1
{ users: [ { first: 10 }, { name: true } ] }

// 2
{ users: { user_id_1: { email: true } } }

The provider might need to construct SQL queries:

# 1
SELECT name FROM users ORDER BY ID ASC LIMIT 10;

# 2
SELECT email FROM users WHERE id="user_id_1";

How would the "ideal" code to get from the query objects to the SQL look?

Improve the default version number

Currently, the device timestamp is used blindly. This is not resilient to timestamp decreasing (due to adjustments etc) and duplicate changes within 1ms.

We need to append a sequence number, remember the last used version, and use the last version with incremented sequence number if the timestamp is unchanged or has decreased.

This change should be made in the graph builder.

When the initial state of a subscription is empty, no payload is pushed

read output has extraneous null with final mode

When using final mode cache:

{ foo: { "1": "34" } }

the query:

{ foo: [ { first: 3 }, 1 ] }

returns

{ foo: [ null, "34" ] }

(roughly).

Rename clock > version

Graffy Query Language

The pure JS "porcelain" query format currently in use is fairly verbose. This is a proposal to mitigate that with a Graffy query language. It aims to be similar enough to GraphQL to be familiar for those using it, but is not necessarily compatible with it.

Here is an example query:

{
  books {
    ( tags: {foo, bar}, publishedUntil: '2000-01-01' ) [
      ( first: 10, after: ('1998-03-23', 4398) ) {
        author {
          name
          photo
        }
        title
        cover
        description
      }
    ]
  }
}

which is equivalent to the current porcelain:

{
  books: {
    [key({
      tags: {foo: true, bar: true},
      publishedUntil: '2000-01-01',
    })]: [
      {
        first: 10,
        after: key('1998-03-23', 4398),
      }, {
        author: { name: true, photo: true },
        title: true,
        cover: true,
        description: true,
      }
    ]
  }
}

The transformations (to the current porcelain structure) are quite straightforward:

(foo: 1) becomes key({ foo: 1 })
('foo', 'bar') becomes key(['foo', 'bar'])
{ foo, bar } becomes { foo: true, bar: true }
before, after etc. within [...] get collected into an object
, and : are added as needed

Wrap / unwrap undefined should remain undefined

Alternative to aliases

@baopham @email2vimalraj

The consumer APIs (which change to read, write and watch) could gain a path argument to avoid having to implement aliases.

By and large, Graffy encourages granular queries; if a component has the sort of data need that requires aliases, it might be better served by just making two queries.

However using dynamic keys in queries comes with a bit of boilerplate that could be eliminated.

Problem

const postId = get_post_id_somewhere();
result = await gs.read({ posts: { [postId]: { ... } });
const what_i_really_want = result.posts[postId];

It feels even worse when using filter parameters:

const filter = encodeKey({ tags: ['tech', 'javascript'] }); // This is some opaque string.
result = await gs.read({ filteredPostsByTime: { [filter]: [ { first: 10 }, {...} ] });
const what_i_really_want = result.posts[filter];

I have to store the encoded filter into a variable even though it has no meaning or use outside that query.

Solution

I feel that a better API might be:

const postId = get_post_id_somewhere();
const just_the_post = gs.read( ['posts', postId], { ... });

or with the filter:

const filteredPosts = gs.read([ 'filteredPostsByTime', encodeKey(...) ], [ { first: 10 }, { ... } ]);

What say?

In read/write/watch, we would wrap the query in the path before passing to .call(), and unwrap the results before returning.

Typescript checking, emitting definitions

Primary goal: Add typings to the published NPM modules.
Secondary goal: Get type checks into the development workflow for Graffy itself.

The preferred approach is to use JSDoc-style function annotations (that TypeScript supports) rather than converting to Typescript syntax.

Poor perf when pushing initial state in mockVisitorList

In the subscription provider of the example mock visitor list, pushing the initial state (rather than undefined) should improve performance slightly by not requiring a separate get. However it looks like it reduces performance drastically.

Requires investigation.

Counted queries and change streams

TL:DR; Some watch() providers may handle { after: '', before: 'b' } but not { first: 15 }. How do they comunicate this?

Original write-up

Graffy providers often have limitations around what queries they can fulfil. They need to be able to signal these limitations, so graffy-fill can figure out ways to work around them.

Currently, we use some ad-hoc mechanisms to signal limitations. Perhaps we could design these in a more systematic way.

Current approaches

Dangling links

Consider the posts and users example. Let's say the posts resolver cannot fetch user data - if author info was requested, it ignores the nested fields and simply returns a link as the author field.

Graffy-fill makes a new (live) query for the linked data.

Change streams

Imagine a subscription provider that can provide change streams but not the initial result (current state). It signals this by yielding undefined as the first value.

Graffy-fill makes a separate fetch to get the initial value.

New requirements

Page bounds

Imagine a change stream provider pushing updates for users. Say it does not have access to the current state, but can access an event stream of user updates where each update specifies the user_id.

Say the query is for the first 30 users.

In a scenario where there are thousands of users, MOST user updates will be irrelevant for this query. However, there is no way for this provider to know that, because it cannot know the range of IDs that match "first 30".

Perhaps there should be a way for the provider to signal that it cannot serve "counted" pages (i.e. that use first / last parameters) but can serve "bounded" ones (i.e. those that only have before AND after, but no first / last).

Graffy fill could use the fetch results to convert a "counted" page into a "bounded" one.

NOTE: If the pagination happens in an "index" (nodes where all the children are links), it will work fine if the change stream provider ignored the bounds queries and just pretend like there are no updates. However it seems like this is just working "by accident".

Watch queries with per-node "raw"

Consider a watch query:

{
  users: [{
    name: true,
    email: true
  }]
}

Currently there are two modes for this watch: "values" mode, where every response will contain all users, and "raw" mode, which will contain only the changes. A common use case is for a "raw+" mode where you receive only the changed users, but for a particular user that changed both name and email is received (although only one of them has changed).

This is convenient for watching processes that would otherwise need to watch changes and then load every entity.