Giter VIP home page Giter VIP logo

Comments (13)

benjie avatar benjie commented on May 3, 2024 3

Isn't this equally true for white-listed persisted query support?

There are already multiple specifications for persisted queries that work in different (but similar) ways. Relay have produced more than one over the years; here's their latest: https://relay.dev/docs/guides/persisted-queries/ . Apollo has its own one as you mention. Standardizing this so that everyone can use one common, official, spec is beneficial for all; or at least I hope so!

As far as I know, there is only one specification for APQ; creating a competing specification that works in a different way is not desirable, and creating one that works in the same way is redundant. Further, we have the extensions issues that I mentioned above.

Thanks for sharing why you feel that trusted documents would be a lot of effort to take on, it definitely seems I'm spoiled by the ease of doing this kind of thing in the JS ecosystem! I agree the single-document approach should serve you well.

from graphql-over-http.

n1ru4l avatar n1ru4l commented on May 3, 2024 1

We are currently building GraphQL Yoga v3 plugins for Automatic persisted queries and also traditional built-time extracted persisted documents.

I see Automatic persisted queries as an upstream client-to-origin payload size optimization and stored/persisted operations/documents as a security feature for preventing the processing of arbitrary GraphQL documents.

It seems like all urql and apollo-client have built-in support for Automatic persisted queries (which itself does also not have a real specification) and the "extensions-protocol" became the de-facto standard for both APQ and persisted document implementations (see mercurius).

However, we would rather prefer to use something for persisted operations that actually follows a specification. IMHO the GraphQL over HTTP specification would be perfect for this.

In enisdenjo/graphql-ws#188 (comment) there has been a hint that this could simply be solved by introducing an id field that if set replaces the query field.

from graphql-over-http.

Shane32 avatar Shane32 commented on May 3, 2024 1

So "persisted documents" is effectively a middleware in front of a GraphQL service - the incoming request is decoded as GraphQLPersisitedDocumentRequest, converted to GraphQLRequest and then fed onto the GraphQL service.

In short, we would need to add DocumentId as a property to the class (visible to all code within the execution chain) just so that the Persisted Queries module could function.

I really don't think this would be necessary; if, after reading the above, you still think it is necessary, please could you expand?

I'll expand, but it's more of an implementation detail of GraphQL.NET and may not apply to other languages. We can change the design easily enough to accommodate the specification.

So far, this is how GraphQL.NET wires up requests:

graph TD;
subgraph a [Server]
A[Server deserializes request]
end
A-->|ExecutionOptions instance| C;
subgraph b [Optional middleware]
C[Configure custom validation rules]-->D
D[Handle APQ support]-->E
E[Handle document caching]-->F
F[Configure authentication]-->F2
F2["Start tracing/logging"]
end
F2-->|ExecutionOptions instance| G
subgraph c [Execution engine]
G["Parse (if applicable)"]-->H
H[Validate]-->Execute
end
Execute-->|ExecutionResult instance| Tracing
subgraph d [Optional middleware]
Tracing["Add tracing data / finish logging"]
end
Tracing-->|ExecutionResult instance| Response
subgraph e [Server]
Response[Server serializes response]
end

Middleware is configured as simple functions that accept ExecutionOptions and return ExecutionResult, such as in this simple example:

.ConfigureExecution(async (options, next) =>
{
    // record time execution started
    var start = DateTime.UtcNow;
    // enable metrics
    options.EnableMetrics = true;
    // execute next step in the chain
    var ret = await next(options).ConfigureAwait(false);
    // add apollo tracing data to execution result
    ret.EnrichWithApolloTracing(start);
    // return execution result
    return ret;
});

As is the case now, and as noted by @benjie above, the design should be that "'persisted documents' is effectively a middleware in front of a GraphQL service" - and so it would be required that DocumentId is a newly added property to the ExecutionOptions instance in the above sample, even though it cannot be used directly by the execution engine. Not a big deal if it's part of the official spec. The server will be aware of the DocumentId field during deserialization and the execution engine will reject the request if it has not been replaced by a query string (or pre-parsed document instance) by middleware.

Notice that within GraphQL.NET, the server deserializes the request as it is first received, prior to being passed to any middleware. This allows for various forms of requests, such as application/graphql or GET requests. GraphQLRequest is only used during deserialization of JSON requests, prior to it being passed to the middleware chain. As such, having a separate representation of GraphQLPersisitedDocumentRequest vs GraphQLRequest, while logical, is not useful within GraphQL.NET.

from graphql-over-http.

Shane32 avatar Shane32 commented on May 3, 2024 1

I agree that if persisted queries are part of the official spec, then it is better that they have their own property within GraphQLRequest rather than utilizing extensions, saving extensions for user data.

from graphql-over-http.

benjie avatar benjie commented on May 3, 2024 1

@Shane32 Thanks for sharing the details! It does look like you'd need to change GraphQL.NET a little to adopt this. One option would be to add a "request coercion" phase at the front which takes an IncomingRequest (this should be transport agnostic so that it works for websockets/HTTP/etc) and a PartialExecutionOptions and calls a number of "request coercion middlewares" (including the builtin one which just copies things over) which populate the PartialExecutionOptions from the IncomingRequest. That's when you'd do the APQ dance and other related things. Then at the end of that phase you'd coerce PartialExecutionOptions to ExecutionOptions (or raise an exception) and you're good to continue through your normal flow which relies on ExecutionOptions having non-nullable Query and so on. This would also allow you to experiment with alternative serialization formats, loading the persisted operation from the URL/query string, and so on.

from graphql-over-http.

Shane32 avatar Shane32 commented on May 3, 2024 1

Originally posted by @benjie in #260 (comment) :

One of the big outcomes for me during GraphQLConf is that almost all of the attendees should have been using persisted operations (NOT APQ), and yet almost none of them are. Many cite a lack of documentation around how to do it, and others got confused by the confusingly named automatic persisted queries (APQ) which forgo many of the benefits of persisted operations, only keeping the bandwidth savings.

I would like to comment on the lack of APQ in the existing draft spec:

If we create a specification that only includes support for persisted queries and neglects APQ, we run the risk of excluding a significant segment of developers who find persisted queries too challenging to implement in their current setups. This concern is not just theoretical; as pointed out by @benjie recently, most developers today are using APQ over persisted queries in their workflows. Persisted queries often demand substantial modifications to both client and server architectures, such as adding a dedicated query repository. In contrast, APQ offers a simpler and more straightforward path to adoption, which can be especially important for teams with established workflows.

As a result, a comprehensive spec should aim to accommodate both approaches to meet the varied needs across the developer landscape. Note that I am one of these users where the net benefits of persisted queries are minimal over APQ, and where proper persisted query support (both on client and server side) would not be worth the effort. APQ support, on the other hand, while having only a minimal benefit, could be easily adopted. I also have a separate application where persisted queries would be a perfect fit -- but it would require a significant amount of time to implement.

from graphql-over-http.

Shane32 avatar Shane32 commented on May 3, 2024 1

I'm in no hurry to specify APQ in this, not least because Apollo's APQ spec already exists and works fine

Isn't this equally true for white-listed persisted query support? I understand the confusion over the name and feature set, and I would tend to agree that separating the features further by calling one "trusted documents" vs "persisted queries" or similar would be beneficial. But 'trusted documents' are already supported using the protocol defined by the APQ spec, so this RFC is already a duplicate of an existing specification. It would seem to me that these two features go hand in hand, and are mostly the same feature operating in two different modes (from a technology/specification point of view).

What would make it take significant time, do you think?

Well, for better or worse, we have this, geared towards rapid application development:

  1. One repository for server code, and one for SPA code.
  2. CD configured for both server and SPA for independent zero-downtime releases.
  3. GraphQL calls within the SPA are just strings representing GraphQL documents. No processing occurs at build time or runtime on these documents. However with lazy-loading aka code-splitting, only relevant parts of the SPA are loaded when needed.

To have white-listed documents, we would need to:

  1. Configure the production server with some type of authentication pattern that allows for any query by the devs, but only trusted queries by released code.
  2. Change the SPA code so that during a release, these strings are replaced by sha256:.... and a list of trusted queries are exported. (Note: I have no idea how to perform this step -- maybe with preval somehow?)
  3. During release, these trusted queries are uploaded to the server.
  4. Modify the hooks that call GraphQL so they recognize documents that start with sha256: as a document id and properly format the request.

This is no small undertaking. We have an intranet application and a public web application. For the intranet application, this would have few benefits. For the public web application, it would be a great security feature while also reducing app load time.

We don't use Gradle or codegen, unless it's part of the react template. The compiled SPA is delivered as static files via CDN. I should probably investigate codegen.

Another solution might be to combine all GraphQL operations into a single document, and identify them within the application by operation name. The document could be stored in a separate file so that the build script can easily upload it to the server as a trusted document. A single preval(?) line of code would be needed to make the released code exclude the document source. Still, this would take a significant amount of effort to accomplish as every file would need to be modified.

from graphql-over-http.

benjie avatar benjie commented on May 3, 2024

@n1ru4l points out a more correct term is actually "stored documents"

from graphql-over-http.

Shane32 avatar Shane32 commented on May 3, 2024

I may not be able to make it to the October 2023 meeting, but I'd like to supply my comments for the suggested Persisted Operations RFC. I'll provide comments from two viewpoints: as a GraphQL.NET maintainer, and as a user.

Note: see @benjie 's comments here

As a GraphQL.NET maintainer...

As a GraphQL.NET maintainer, the suggestion of storing the documentId in a property separate from query or extensions is problematic. Let me explain:

.NET is strongly typed, and GraphQL.NET includes a type that matches the form of the GraphQL request, having four properties: Query, OperationName, Variables and Extensions. GraphQL.NET Server (and GraphQL.NET Client) takes this known type and serializes/deserializes it as needed when executing requests, formatting it properly such as converting the property names to camelCase. Then the request is passed to a configurable request chain for execution.

The existing APQ code is configured by the user within this request chain, which intercepts the request, either (a) responding directly for invalid doc IDs, (b) pulling the query from the cache if the doc ID is known, or (c) passing through requests that do not contain APQ metadata. Note that this APQ middleware is entirely separate from GraphQL.NET Server, as it can be applied to any transport - most notably, WebSocket protocols.

GraphQL.NET also supports caching the parsing and validation steps of queries. This is done independently of APQ and works for both APQ queries and non-APQ queries. This is just another step in the configured execution chain.

This all works well as Apollo's APQ specification stores the APQ metadata within the extensions property. But with the documentId stored as a sibling to the query, variables, operationName and extensions properties, the server would now need to be APQ-aware in order to handle requests. In short, we would need to add DocumentId as a property to the class (visible to all code within the execution chain) just so that the Persisted Queries module could function.

Second, GraphQL.NET caches the original query when APQ is in use so that it can be used for error messages pursuant to the GraphQL specification for the location property. If the client "strips ignored tokens" prior to hashing the query, the locations of any error responses may not correctly match up to the client's original query string.

Third, @benjie wrote:

the attendees should have been using persisted operations (NOT APQ)

I'm assuming this refers to the two Apollo feature sets, both of which I believe are implemented the same at the protocol level. As such, I believe this comment lies outside the discussion here. Please correct me if I'm wrong.

As a GraphQL user...

Please keep in mind that the following are personal opinions, largely based on my specific use case:

The specification indicates that the hash should be comprised of the query "stripping ignored tokens". This is a major problem for my e-commerce site as it requires that the client contain code to parse and recreate a GraphQL Document. GraphQL's beauty is in its simplicity. All browsers, even IE, support native JSON parse and stringify operations. And a GraphQL request/response comprises of a query in the form of a static text string combined with some variables, POSTed to a server, and the response interpreted as JSON. While it is true that many people likely use Apollo's GraphQL client library (which references graphql-tag, which probably has the requisite functionality), including these additional libraries would place a much heavier bandwidth and CPU burden on the client than transmitting the queries literally without APQ. And for an e-commerce site, when Google ranks you based on how fast your webpage reaches idle, it's critical to eliminate unnecessary dependencies.

So the persisted queries specification as written in the current RFC would not be feasible for me, whereas the Apollo APQ specification would only require a small amount of additional client code. Note that all current browsers can natively perform SHA hashes, and can do so within another cpu thread as it is an asynchronous API. So the hash algorithm would have minimal effect on page download / display time.

I suppose ideally the hash would be determined at compile-time and embedded directly in to the source code of the client application. I have not investigated the difficulty required to implement this.

I also have intranet applications, where these constraints do not exist to the same extent. While using persisted queries should be a net benefit there, they also are expected to be used by users that have corporate high-bandwidth connections, and as such the additional bandwidth required per GraphQL request makes little difference.

Conclusion

While I endorse having a standardized protocol for persisted queries, I would recommend following Apollo's design choices regarding (a) persisted-query metadata stored within extensions, and (b) hash based on the literal query. My only particular compliant is that they are using the error message as an error code. I would wish that the error message would be undefined and left up to the implementation, while some other property be used to identify the APQ error. For example, the error's extension property could also contain persisted-query metadata so that clients could identify it. But if the client was not persisted-query aware, it would display a human-readable error message. However, this is a trifle; anyone sending APQ requests is likely to be APQ-aware of the responses.

from graphql-over-http.

JoviDeCroock avatar JoviDeCroock commented on May 3, 2024

I appreciate the insights, thank you!

the suggestion of storing the documentId in a property separate from query or extensions is problematic

I would like to avoid extensions as that feels less like this is becoming a first-class citizen of the spec, I agree with you that for strongly typed languages this is an additional property and might become a bit harder.

The specification indicates that the hash should be comprised of the query "stripping ignored tokens". This is a major problem for my e-commerce site as it requires that the client contain code to parse and recreate a GraphQL Document

I am not married to the idea of stripping ignored tokens/... my main thinking here was that we could both verify that a SHA is valid and that it reduces potentially duplicate operations. Nothing here sais that the tokens should be stripped at runtime, I have seen approaches with GraphQL Code Generator where this is part of the generated document so it's just present at runtime without prior calculations. The verification of a valid SHA was me assuming that this would come in handy for APQ as well to prevent folks from persisting erroneous documents.

I would recommend following Apollo's design choices regarding (a) persisted-query metadata stored within extensions

My main reasoning for document_id comes from following the Relay specification here which uses doc_id as a way to convey this alongside query/...

I guess maybe, after evaluating some of these points, we need to omit APQ for that reason, encouraging folks in a good direction.

from graphql-over-http.

benjie avatar benjie commented on May 3, 2024

the suggestion of storing the documentId in a property separate from query or extensions is problematic
I would like to avoid extensions as that feels less like this is becoming a first-class citizen of the spec, I agree with you that for strongly typed languages this is an additional property and might become a bit harder.

100%; we must not use the extensions property for this because that is reserved for end-user usage. If we start specifying things inside of extensions then we'd need to define a new level (extensions.extensions) that is future-safe for users to use for their own purposes - it would undo the point of having extensions in the first place (which was to make it so that GraphQL can make changes at the top level without breaking any end-user customizations of request/response).

We're actually dealing with two concretely different types of request:

type GraphQLRequest {
  query: string;
  variables?: Record<string, any>;
  operationName?: string;
  extensions?: Record<string, any>;
}
type GraphQLPersisitedDocumentRequest {
  documentId: string;
  variables?: Record<string, any>;
  operationName?: string;
  extensions?: Record<string, any>;
}

(Note query is required; documentId is required. Note: Apollo's APQ is not covered in this; but a similar proposal could be by adding query?: string to GraphQLPersisitedDocumentRequest.)

When your server receives a GraphQLPersisitedDocumentRequest, one option is to take that request, fetch the relevant document, and turn it into a GraphQLRequest to feed through to the main GraphQL service. This is, in my opinion, what we should actually specify. Servers may implement optimizations over this, but that's not something that we should specify in my opinion (and it's entirely possible to cache parsing and validation with the GraphQL document (or a hash thereof) as the key without needing persisted operations - GraphQL.NET and PostGraphile have both done this for years - as have likely loads of other servers).

So "persisted documents" is effectively a middleware in front of a GraphQL service - the incoming request is decoded as GraphQLPersisitedDocumentRequest, converted to GraphQLRequest and then fed onto the GraphQL service.

In short, we would need to add DocumentId as a property to the class (visible to all code within the execution chain) just so that the Persisted Queries module could function.

I really don't think this would be necessary; if, after reading the above, you still think it is necessary, please could you expand?

If the client "strips ignored tokens" prior to hashing the query, the locations of any error responses may not correctly match up to the client's original query string.

We should not strip the tokens; we should hash the raw document. Clients may choose to generate a minimized document and use that as the input to persisted documents; that's entirely up to them, but is not something we should specify.

the attendees should have been using persisted operations (NOT APQ)

I'm assuming this refers to the two Apollo feature sets, both of which I believe are implemented the same at the protocol level. As such, I believe this comment lies outside the discussion here. Please correct me if I'm wrong.

APQ is an Apollo feature that negotiates the storage of a query as a hash for bandwidth optimization.

Persisted documents is a general GraphQL principle that has existed much longer (independent of Apollo; in fact used inside Facebook before GraphQL was even open-sourced as far as I am aware) and has significantly greater benefits - not least in terms of security and cacheability of requests. Relay uses the documentId key for persisted documents.

I'm proposing that we specify persisted documents as a document allowlist; and then we express an equivalent of APQ as an optional extension of that, but noting that it loses a lot of the benefits. In my opinion the vast majority of GraphQL users (everyone whose GraphQL API is for their own apps and websites only; not intended to be used by third parties) should be using persisted documents, and only a subset of the remainder (e.g. GraphQL APIs designed to be used by third parties, such as the GitHub GraphQL API) should be using APQ or something similar.

The specification indicates...

Please note that RFC documents get merged without much in the way of review; this isn't even officially a Stage 0 proposal yet; it's definitely not "the specification" for persisted documents.

Relay specifies an MD5 hash: https://relay.dev/docs/guides/persisted-queries/

I think we would recommend the usage of a prefix; e.g. {"documentId": "sha256:ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4d5d612dd5ec54a6b38"} and RECOMMEND that the sha256: prefix is implemented in the interests of compatibility, but note that servers MAY support whatever document identification strategy they like so long as the same goal is achieved.

as it requires that the client contain code to parse and recreate a GraphQL Document

"Stripping ignored tokens" should 100% not be a requirement to use persisted operations; any transform that the client wishes to make to the document before hashing it/storing it should be up to them.

I should note that in most cases, for "persisted operations" (and, again, not for APQ, which lacks many of the benefits of persisted operations) the hashing of the document should take place at build time, so there is no requirement to include any additional data in your bundle other than the hashes of the operations you're using. For some clients; you don't even need to include the operations themselves any more, just the hashes, so persisted operations could make your bundle smaller!

I guess maybe, after evaluating some of these points, we need to omit APQ for that reason, encouraging folks in a good direction.

💯 If we specify APQ, it should be done with significant indication of the downsides of using it. It's a handly bandwidth optimization for public GraphQL APIs, but almost all other APIs should be using persisted documents as an operation allowlist.

from graphql-over-http.

benjie avatar benjie commented on May 3, 2024

I've written up my first draft of spec edits to accommodate persisted documents: #264

(Note I've also added a definition of a "persisted operation" in that to be a "persisted document" containing exactly one operation - I think persisted operations are the most typical usage today.)

from graphql-over-http.

benjie avatar benjie commented on May 3, 2024

I'm in no hurry to specify APQ in this, not least because Apollo's APQ spec already exists and works fine - we don't need two competing specifications for that. We'd also have to add new top-level fields both to requests and responses (or, arguably, errors) to support it, and I'm not keen to do that without the GraphQL Spec itself either adopting or "reserved keyword"-ing those changes.

I'd also like to make sure there is a significant difference between the name of APQ and persisted documents; "automatic" persisted documents makes it sound like you get the benefits of persisted documents but with minimal effort; but actually you lose the most important feature: that it's an operation allowlist. Then people will combine APQ with "disabling introspection" and think that's good enough, when really at best you're just making attackers put in the tiniest bit more effort. On top of that, a good persisted operations setup lets you track which client(s) the operations are coming from, so you know how long you need to support those operations.

Don't get me wrong; APQ is great as a bandwidth optimization, it just muddies the meaning of "persisted queries" and means people don't discover this powerful model that has been in use since before GraphQL was open source!

as pointed out by @benjie recently, most developers today are using APQ over persisted queries in their workflows

This isn't what I said, exactly. I meant that when telling people about persisted queries, they would find automatic persisted queries and think that's what we were talking about. This is the risk of having names so similar. Perhaps we should call "persisted documents" as an allowlist something like "trusted documents" instead. Actually... That seems like a good name. I'll write something up about that (separate from this discussion).

but it would require a significant amount of time to implement.

What would make it take significant time, do you think? From my perspective it seems like codegen would need to write out your documents, then a script would perform a simple hash operation on them and write them to a key-value store. Perhaps if you don't have codegen then adding that step is significant?

I tend to just write my trusted documents to my monorepo as .persisted_operations/<hash>.graphql and then the server automatically has access to them; this works for my web, mobile and desktop apps (though, admittedly, these are all React-based). When this is no longer suitable I'd just have CI submit them to the server for me and the server would check credentials and then write them to whatever durable KVS I have kicking around, even just S3 if push came to shove.

from graphql-over-http.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.