Giter VIP home page Giter VIP logo

graffy's Introduction

Graffy logo

Graffy NPM version

Graffy is a toolkit for graph-centric APIs. It has capabilities comparable to GraphQL and Firebase.

Why?

Graffy supports complex, expressive live queries - with multiple levels of resource expansion and pagination - based on a novel application of set theory and CRDTs.

Client-side example

import Graffy from '@graffy/core';
import GraffyClient from '@graffy/client';

const store = new Graffy();
store.use(new GraffyClient('/api'));

const query = {
  posts: [{ last: 10 }, { // Pagination parameters
    title: true,
    author: { name: true }
  }]
};

for await (const state of store.watch(query)) {
  // Iterates each time relevant data changes on the server.
  console.log(state);
}

Why Graffy?

Graffy provides live queries, which give clients a real-time view of the data they need. Graffy supports complex queries with nested graph traversals and pagination, while exposing a simple and intuitive API for building clients and servers.

Graffy was inspired by (and borrows from) Facebook's GraphQL and Netflix's Falcor. Compared to GraphQL, Graffy offers a more familiar data model, true live queries and more efficient caching. Compared to Falcor, it provides cursor-based pagination and real-time subscriptions.

Unlike GraphQL resolvers and Falcor data providers, Graffy providers can be composed like Express/Koa middleware. This allows authentication, validation, custom caches and resource limiting to be implemented in a straightforward manner.

Graffy providers can also perform efficient bulk reads from underlying data stores (for example by constructing optimized SQL queries). This is particularly hard to do with GraphQL (see dataloader) and Falcor.

Modules

The graffy metapackage exports a constructor for a Graffy store in its default configuration.

All the Graffy packages are published under the @graffy scope on NPM.

Module Description
core Module management
fill Fulfil queries from many providers
client EventStream/HTTP client
server EventStream/HTTP server
cache In-memory cache
common Shared utilities
react React container and hooks API
stream Utility for making AsyncIterables
testing Testing and debugging utilities
graphql Translate GraphQL to Graffy
schema ⌛ Validation, introspection API
viewer ⌛ Schema introspection client
auth ⌛ Authentication and authorization
limit ⌛ Resource consumption accounting
mysql ⌛ Data source connector
postgres ⌛ Data source connector

⌛ = On the roadmap.

Capabilities

Graffy GraphQL Falcor Description
Narrow queries Queries specify required fields; Allows API evolution
Deep queries Queries can expand nested resources; Reduces round-trips
Live queries Push changes to query results in real time
Pagination cursors Enables efficient pagination on the server
Parameters Custom filtering criteria, etc.
Caching pages Cache result of paginated queries
Atomic writes Writes that trigger accurate cache invalidation
Non-data endpoints Mutations, subscriptions, cross-resource search

graffy's People

Contributors

aravindet avatar ashniu123 avatar bibhuty-did-this avatar bqrkhn avatar dependabot[bot] avatar email2vimalraj avatar nk-nekatr avatar rizkisunaryo avatar sebdeckers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

graffy's Issues

JS.ORG CLEANUP

JS.ORG CLEANUP

Hello, it seems a js.org subdomain that was requested to target this repository no longer works.
The subdomain requested was graffy.js.org and had the target of aravindet.github.io/graffy.
It produced the following failures when tested as part of the cleanup:

  • HTTP: Failed with status code '404 Not Found'
  • HTTPS: Failed with status code '404 Not Found'

To keep the js.org subdomain you should add a page with reasonable content within a month so the subdomain passes the validation.
Failure to rectify the issues will result in the requested subdomain being removed from JS.ORGs DNS and the list of active subdomains.

If you are wanting to keep the js.org subdomain and have added reasonable content, YOU MUST reply to the main cleanup issue with the response format detailed at the top to keep the requested subdomain.

🤖 Beep boop. I am a robot and performed this action automatically as part of the js.org cleanup process. If you have an issue, please contact the js.org maintainers.

Replace pageInfo with nextRange and prevRange

It's more useful, more succinct, and protects the user from having to deal with \0s and \uffffs

The change should happen in decorate.js

Instead of:

arr.pageInfo = {
  hasNext: false,
  hasPrev: true,
  start: '',
  end: 'foobaq\uffff'
}

we should have:

arr.prevRange = null;
arr.nextRange = { first: 10, after: 'foobar' };

The null prevRange indicates that this is the first page. The first / last should match the current page size, and before / after should use the keyAfter / keyBefore helpers from @graffy/common.

Server should support more REST-like paths

This is a nice-to-have for 1.0.

As a prerequisite, we should add a soft convention for naming indexes, e.g. as '$<index_name>`, then we can do:

  • /posts?by=time&first=10&fields=slug,title,at,authors(first:1,name,avatar) should become:

    {
      'posts$time': [ { first: 10 }, {
        slug: 1, title: 1, at: 1,
        authors: [ { first: 1}, {
          name: 1, avatar: 1
        } ]
      } ]
    }
  • GET /posts/123?fields=slug,title,at,author(name,avatar) should become:

    {
      'posts': { 123: {
        slug: 1, title: 1, at: 1,
        author: { name: 1, avatar: 1 }
      } }
    }

MVCC: "Multi-layer" graphs

Version storage might be simplified by only having one writeVersion and one readVersion per tree, but to pack multiple trees into "layers".

  • Merging would simply add a new layer, and occasionally perform "vacuuming" to remove data from old layers that has been shadowed by later layers.
  • Optimistic updates (on client) and two-phase commit (distributed backends) can then be implemented easily by delaying the "vacuuming" until commit occurs.
  • Slice will need to be done per layer.
  • setVersion would merge everything to a single layer.
  • Seive will only work on single-layer graphs, which require calling setVersion first. The sieve mechanism to detect relevant updates is not resilient to out-of-order updates anyway; a query denormalization-based approach should be considered.

To stop this getting out of hand, queries are immutable and can only have one layer - so all parts of a query must have the same min version requirement. When merging queries, we can take the max(readVersion) and min(writeVersion) to ensure that the data required by all constituent queries are requested.

This is inextricably linked to #2 .

Alternatives

  • This is a significant change from the existing CRDT / LWW model.
  • An alternate model to MVCC in databases is an undo log. It has some advantages but it is not entirely clear how it might be implemented within the Graffy data model.

Two-argument form of link()

An index provider might be able to retrieve the necessary information at the link, not just its location. Allowing the provider to do, for example:

store.onRead('/posts$', query => {
  const posts = getPostsFromDb(query);
  return _.fromPairs(posts.map(post => [
    key([post.createdAt, post.id]),
    link(`/posts/${post.id}`, post),
  ]));
})

Improve version

Currently, every node has a version value. In practice, in most (but not all) graphs and queries, all nodes have the same version. There is some redundancy here.

In subscription caches, we need to update the version number of the entire cache whenever there is a new update. With the current data structure, this takes O(size of cache) time, while other operations only take O(size of change).

It might be beneficial to rethink how version is stored and manipulated in the internal representation.

Filtering links

Create a special "filtering link" which, when traversed, modifies the keys immediately under the link to add filtering parameters.

Illustrative use case

Consider the schema

{
  posts: { [pid]: Post },
  posts$$createdAt: { [filter]: { [createdAt]: link(`/posts/${pid}`) } },
  users: { [uid]: User }
}

where the posts$$createdAt index can be filtered by authorId and tag.

Imagine we want to query last 3 posts of a user, with a particular tag, alongside their name. While this is possible already with a query containing a users branch as well as a posts$$createdAt branch, such a query would be unintuitive, duplicate userIds, and require additional post-processing of results.

Ideally this query should work:

# Query
{
  users: { '123': {
    name: 1,
    posts: { [key({ tag: 'example' })]: [{ last: 3 }, {
      title: 1, createdAt: 1
    }] }
  } }
}

and Graffy should send the following query to the posts$$createdAt provider:

{ [key({ tag: 'example', authorId: '123' })]: {
  title: 1, createdAt: 1
} }

This could be done if the user provider returned a "filtering link" for the posts property:

{
  name: 'Example',
  posts: link(['posts$$createdAt', { authorId: '123' }])
}

Rethink Watch

TL:DR; Replace watch() with incremental read() polling

Why

The current implementation of watch() is complex to implement in providers and doesn't support back-pressure or resumption.

Steps

  • Implement the new where query version semantics ("if-changed-after")
    • Specify that version is a non-negative number, and that version 0 has a special meaning
    • Implement the querying of linked paths in graffy-link and drop graffy-fill completely
    • Restrict the use of finalize() to queries with version 0 in core, pg and link
  • Implement query version filtering in slice(): exclude unchanged from both known and unknown
  • Implement query version filtering in pg by adding a condition on verCol
  • Implement the async iterator form of read() to perform incremental polling

Clean up extraneous branches in subscription queries

Currently, graffy fill makes extra queries for subscriptions when resolving links. However, it does not clean those up when the link is updated.

This is currently planned to be fixed by extending slice() to return extraneous as well.

Helpful error messages

  • This watch handler did not yield an initial value within five seconds. If it's a change handler only, please ensure that it yields undefined first.
  • (more)

APIs on query objects

@baopham Thread to discuss what sort of APIs the query object should have to make it easy for providers that might want to (1) construct a query, like SQL or ES (2) identify topics to subscribe to.

Say you want to write a provider /users that needs to serve both queries like:

// 1
{ users: [ { first: 10 }, { name: true } ] }

// 2
{ users: { user_id_1: { email: true } } }

The provider might need to construct SQL queries:

# 1
SELECT name FROM users ORDER BY ID ASC LIMIT 10;

# 2
SELECT email FROM users WHERE id="user_id_1";

How would the "ideal" code to get from the query objects to the SQL look?

Improve the default version number

Currently, the device timestamp is used blindly. This is not resilient to timestamp decreasing (due to adjustments etc) and duplicate changes within 1ms.

We need to append a sequence number, remember the last used version, and use the last version with incremented sequence number if the timestamp is unchanged or has decreased.

This change should be made in the graph builder.

Graffy Query Language

The pure JS "porcelain" query format currently in use is fairly verbose. This is a proposal to mitigate that with a Graffy query language. It aims to be similar enough to GraphQL to be familiar for those using it, but is not necessarily compatible with it.

Here is an example query:

{
  books {
    ( tags: {foo, bar}, publishedUntil: '2000-01-01' ) [
      ( first: 10, after: ('1998-03-23', 4398) ) {
        author {
          name
          photo
        }
        title
        cover
        description
      }
    ]
  }
}

which is equivalent to the current porcelain:

{
  books: {
    [key({
      tags: {foo: true, bar: true},
      publishedUntil: '2000-01-01',
    })]: [
      {
        first: 10,
        after: key('1998-03-23', 4398),
      }, {
        author: { name: true, photo: true },
        title: true,
        cover: true,
        description: true,
      }
    ]
  }
}

The transformations (to the current porcelain structure) are quite straightforward:

  • (foo: 1) becomes key({ foo: 1 })
  • ('foo', 'bar') becomes key(['foo', 'bar'])
  • { foo, bar } becomes { foo: true, bar: true }
  • before, after etc. within [...] get collected into an object
  • , and : are added as needed

Alternative to aliases

@baopham @email2vimalraj

The consumer APIs (which change to read, write and watch) could gain a path argument to avoid having to implement aliases.

By and large, Graffy encourages granular queries; if a component has the sort of data need that requires aliases, it might be better served by just making two queries.

However using dynamic keys in queries comes with a bit of boilerplate that could be eliminated.

Problem

const postId = get_post_id_somewhere();
result = await gs.read({ posts: { [postId]: { ... } });
const what_i_really_want = result.posts[postId];

It feels even worse when using filter parameters:

const filter = encodeKey({ tags: ['tech', 'javascript'] }); // This is some opaque string.
result = await gs.read({ filteredPostsByTime: { [filter]: [ { first: 10 }, {...} ] });
const what_i_really_want = result.posts[filter];

I have to store the encoded filter into a variable even though it has no meaning or use outside that query.

Solution

I feel that a better API might be:

const postId = get_post_id_somewhere();
const just_the_post = gs.read( ['posts', postId], { ... });

or with the filter:

const filteredPosts = gs.read([ 'filteredPostsByTime', encodeKey(...) ], [ { first: 10 }, { ... } ]);

What say?

In read/write/watch, we would wrap the query in the path before passing to .call(), and unwrap the results before returning.

Typescript checking, emitting definitions

Primary goal: Add typings to the published NPM modules.
Secondary goal: Get type checks into the development workflow for Graffy itself.

The preferred approach is to use JSDoc-style function annotations (that TypeScript supports) rather than converting to Typescript syntax.

Poor perf when pushing initial state in mockVisitorList

In the subscription provider of the example mock visitor list, pushing the initial state (rather than undefined) should improve performance slightly by not requiring a separate get. However it looks like it reduces performance drastically.

Requires investigation.

Counted queries and change streams

TL:DR; Some watch() providers may handle { after: '', before: 'b' } but not { first: 15 }. How do they comunicate this?


Original write-up

Graffy providers often have limitations around what queries they can fulfil. They need to be able to signal these limitations, so graffy-fill can figure out ways to work around them.

Currently, we use some ad-hoc mechanisms to signal limitations. Perhaps we could design these in a more systematic way.

Current approaches

Dangling links

Consider the posts and users example. Let's say the posts resolver cannot fetch user data - if author info was requested, it ignores the nested fields and simply returns a link as the author field.

Graffy-fill makes a new (live) query for the linked data.

Change streams

Imagine a subscription provider that can provide change streams but not the initial result (current state). It signals this by yielding undefined as the first value.

Graffy-fill makes a separate fetch to get the initial value.

New requirements

Page bounds

Imagine a change stream provider pushing updates for users. Say it does not have access to the current state, but can access an event stream of user updates where each update specifies the user_id.

Say the query is for the first 30 users.

In a scenario where there are thousands of users, MOST user updates will be irrelevant for this query. However, there is no way for this provider to know that, because it cannot know the range of IDs that match "first 30".

Perhaps there should be a way for the provider to signal that it cannot serve "counted" pages (i.e. that use first / last parameters) but can serve "bounded" ones (i.e. those that only have before AND after, but no first / last).

Graffy fill could use the fetch results to convert a "counted" page into a "bounded" one.

NOTE: If the pagination happens in an "index" (nodes where all the children are links), it will work fine if the change stream provider ignored the bounds queries and just pretend like there are no updates. However it seems like this is just working "by accident".

Watch queries with per-node "raw"

Consider a watch query:

{
  users: [{
    name: true,
    email: true
  }]
}

Currently there are two modes for this watch: "values" mode, where every response will contain all users, and "raw" mode, which will contain only the changes. A common use case is for a "raw+" mode where you receive only the changed users, but for a particular user that changed both name and email is received (although only one of them has changed).

This is convenient for watching processes that would otherwise need to watch changes and then load every entity.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.