Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make "processing a response" non-normative and add a note for clients #304

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

martinbonnin
Copy link
Contributor

Comments inline

Comment on lines 514 to 517
In case of errors that completely prevent the generation of a well-formed
_GraphQL response_, the server SHOULD respond with the appropriate status code
depending on the concrete error condition, and MUST NOT respond with a `2xx`
status code when using the `application/graphql-response+json` media type.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, there's a tension between the Body first paragraph:

The body of the server’s response MUST follow the requirements for a GraphQL response
encoded directly in the chosen media type.

and that paragraph, which is why I removed it.

My current understanding is that this paragraph was meant for intermediary servers that may not understand GraphQL. So there are 2 kind of servers:

  • GraphQL server
  • Intermediary server
Client <--> [Intermediary Server] <--> GraphQL Server

I'd say this spec should be about GraphQL servers and not Intermediary servers? Or if we want to specify the behaviour of intermediary server, we could do it in a separate section?

Or were there use cases when a GraphQL server is compliant and does not return a well formed GraphQL response?

Copy link
Contributor

@Shane32 Shane32 Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, it appears the paragraph was intended to address invalid requests or those that the server does not understand (which may be more than only GraphQL-over-HTTP requests). For example, if the request was not able to be parsed as JSON or had an unrecognized content type. This paragraph would enforce that a non-200 status code was used. Whereas if we read just above the 6.1 paragraph you referenced, paragraph 6 starts with:

When a server receives a well-formed GraphQL-over-HTTP request ...

Copy link
Contributor Author

@martinbonnin martinbonnin Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gripe is specifically about that part: In case of errors that completely prevent the generation of a well-formed _GraphQL response_. That contradicts 6.1: The body of the server’s response MUST follow the requirements for a GraphQL response. I understand 6.1 to be about the body. Then the status codes are refined in section 6.4 below.

If the request was not able to be parsed as JSON or had an unrecognized content type, I think there are other places in the spec that address this already:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this paragraph contradicts anything, but I'm okay removing it as redundant / described elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contradiction IMO is in the spec saying: the body must always be a well formed GraphQL response. And then going on to say "In case of errors that prevent the generation of a well-formed GraphQL response", which I understand as if it isn't a well formed GraphQL response, then ...

How can it not be a well formed GraphQL response if it must always be a well formed GraphQL response?

The only reason I see for the body to not be a well formed GraphQL response is if content negociation fails. But as soon as application/json or graphql-response+json is selected, my reading is that the body must always be a well formed GraphQL response?

If content-negociation errors are what's at stake here (i.e. "the errors that prevent the generation of a well formed GraphQL response"), then this case is already handled by 6.1 so I'd remove the sentence. If there are other error cases, I'd love to explicit them cause it's not obvious what they could be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body of the server’s response MUST follow the requirements for a GraphQL response encoded directly in the chosen media type.

I think this was poorly worded; perhaps:

When encoding a GraphQL response, the body of the server’s response MUST follow the requirements in the chosen media type.

Essentially it's saying when you encode a GraphQL response that it must adhere to the spec. The other paragraph (the one you've deleted) is saying if you cannot form a valid GraphQL response then you must not use a GraphQL response type.


and MUST NOT respond with a 2xx status code when using the application/graphql-response+json media type.

This is also incorrect; more correct would be:

and MUST NOT respond with the application/graphql-response+json media type.

The application/graphql-response+json is only for use with well-formed GraphQL responses, so if you can't form a well-formed GraphQL response you can't use it. I think I was probably thinking something like: try { return doTheGraphQLStuff() } catch (e) { return { errors: [{message: e.message}] }} where you could then use the graphql-response+json for these errors; but this is a well-formed GraphQL response so it contradicts the beginning of the sentence.


This doesn't really refer to "intermediary" servers as much as it does the hosting environment. For example if you use nginx and lua to do your GraphQL, but the lua process crashes, then this requires that your nginx isn't configured to still use the application/graphql-response+json response type.

Copy link
Contributor Author

@martinbonnin martinbonnin Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially it's saying when you encode a GraphQL response that it must adhere to the spec. The other paragraph (the one you've deleted) is saying if you cannot form a valid GraphQL response then you must not use a GraphQL response type.

This is what my brain has a really hard time processing 😅 . Thinking in terms of logic:

proposition P: content-type == application/graphql-response+json
proposition Q: isWellFormGraphQLResponse(body) == true

when you encode a GraphQL response that it must adhere to the spec

I think this is the same as if P then Q?

if you cannot form a valid GraphQL response then you must not use a GraphQL response type.

I understand that as if !Q then !P?

So the second sentence is the contrapositive of the first one and therefore strictly equivalent from a logical point of view?

If it brings the attention to a specific point I'm fine with it as a non-normative note but from in the spec text itself, I'd expect no redundancy. Or is there some nuance there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you encode a GraphQL response that it must adhere to the spec

I think this is the same as if P then Q?

I don't think so. P as you have stated it is that you use a specific media type, but what's being stated is that you are encoding a GraphQL response; it could be in any supported media type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright 👍 But I think my initial point that it's not clear whether the "non-valid GraphQL response" is part of the spec or not still holds.

I put the paragraph back in to keep that PR small and about non-normative client notes. Will open a follow up one for "non-valid GraphQL responses"

spec/GraphQLOverHTTP.md Outdated Show resolved Hide resolved
spec/GraphQLOverHTTP.md Outdated Show resolved Hide resolved
spec/GraphQLOverHTTP.md Outdated Show resolved Hide resolved
spec/GraphQLOverHTTP.md Outdated Show resolved Hide resolved
Co-authored-by: Shane Krueger <[email protected]>
spec/GraphQLOverHTTP.md Outdated Show resolved Hide resolved
- split the note in 2
- be explicit about response media type
spec/GraphQLOverHTTP.md Outdated Show resolved Hide resolved
spec/GraphQLOverHTTP.md Outdated Show resolved Hide resolved
Co-authored-by: Shane Krueger <[email protected]>
spec/GraphQLOverHTTP.md Outdated Show resolved Hide resolved
@martinbonnin martinbonnin changed the title Attempt at clarifying server vs intermediary service Make "processing a response" non-normative and add a note for clients Aug 29, 2024
@martinbonnin
Copy link
Contributor Author

@Shane32 @benjie I have updated that PR to only be about "clients" and non-normative notes. It's additive only now.

I'll look at the "non-valid GraphQL responses" in a separate PR. Let me know what you think!

payload is `application/json` then the client MUST NOT rely on the body to be a
well-formed _GraphQL response_ since the source of the response may not be the
server but instead some intermediary such as API gateways, proxies, firewalls,
etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of this normative paragraph is, in my opinion, wrong. The client must not rely on it being a GraphQL response; if they want to treat it as a GraphQL response they must perform their own validations in order to do so (e.g. assert that it conforms to the correct shape/types), and must know that what they're doing is outside of the spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjie , the way I see it, this is a server spec, not a client spec. A server has no control over whether or not a client processes the response correctly, incorrectly, or at all. I think that's why this should be a non-normative paragraph. And the modified language is, I feel, much easier to understand.

Now we could write a client spec, defining what features a client must and must not have. For example, we might require the client send the request with a specific content type, or perhaps to support automatic fallback to a known content type if a server responds with an unsupported media type error. We might define the exact mechanism by which the response shape is verified for known response content types, etc. While it seems much of this is implicit based on the server requirements, the existing specification is not currently written as a client spec and does not contain any requirement written from the perspective of the client (apart from the little written in this paragraph). As such, I feel it is improper for it to have a normative note about processing a response in this spec.

If this paragraph were to be kept as a normative paragraph, I feel it should at minimum be in a section labeled for client requirements. But even then, the existing paragraph says "MUST NOT rely on the body", it doesn't explain what TO do in that situation. Should it look for a certain shape? Should it ignore all data in the response? In other words, it's says "client MUST NOT count on it being well-formed, but it MIGHT be", which is not particularly helpful for a normative note.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay here. +1 to @Shane32 comment.

As a client developer relatively new to this spec but pretty familiar with GraphQL, it took me way longer than I'd like to admit to understand the nuances there.

Having the language from the point of view of the client is easier to process in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a server spec, not a client spec

That does not seem accurate. This spec specifies:

  • how a client should form a request including what details the client must put in the request for it to be likely to be executed
  • how a server should check these details, and handle when the client has not conformed
  • how a server should form a response including what details to include
  • how and when a client should process a response, how to know it originated from the GraphQL service or not (and thus whether it can "trust" that the response is GraphQL or not)

It's a protocol spec that defines the protocol for communicating between a spec-compliant client and server. I believe @enisdenjo implemented both a client and a server based on this spec in graphql-http.

A server has no control over whether or not a client processes the response correctly, incorrectly, or at all.

Indeed; but the client does; hence why we state:

the client MUST NOT rely on the body to be a
well-formed GraphQL response since the source of the response may not be the
server but instead some intermediary such as API gateways, proxies, firewalls,
etc.

It is the client's responsibility to handle this concern. The server may not even be the one supplying this response (it may have come from a gateway, proxy, etc), so of course the server cannot influence this - it is entirely the client's responsibility.

the existing specification is not currently written as a client spec and does not contain any requirement written from the perspective of the client (apart from the little written in this paragraph)

It has an entire chapter dedicated to how to form requests to the server. It also has loads of explicit conformance requirement statements for the client:

The role of a client is to issue HTTP requests to a server in order to interact with a GraphQL service.

Servers and clients MUST support JSON and MAY support other, additional serialization formats.

A client SHOULD indicate the media types that it supports in responses using the Accept HTTP header as specified in RFC7231.

If the client supplies an Accept header, the client SHOULD include the media type application/graphql-response+json in the Accept header.

Before 2025-01-01T00:00:00Z, if the client supplies an Accept header, the header SHOULD include the application/json media type. After this watershed, this is no longer necessary.

It is RECOMMENDED that a client set the Accept header to application/graphql-response+json; charset=utf-8, application/json; charset=utf-8.

A client MUST indicate the media type of a request body using the Content-Type header as specified in RFC7231.

There are many more.

Copy link
Member

@benjie benjie Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this paragraph were to be kept as a normative paragraph, I feel it should at minimum be in a section labeled for client requirements.

We don't currently label client and server requirements separately since both are intermingled; such as:

A client MUST indicate the media type of a request body using the Content-Type header as specified in RFC7231.

A server MUST support POST requests encoded with the application/json media type (as indicated by the Content-Type header) encoded with UTF-8.

To do this in one place but not the rest of the spec would be a little strange. And doing it across the spec would make the spec a lot less readable, in my opinion.

But even then, the existing paragraph says "MUST NOT rely on the body", it doesn't explain what TO do in that situation. Should it look for a certain shape? Should it ignore all data in the response?

Indeed; the client must not rely on the body. What it does with the body is up to it, but it cannot be trusted. Throwing an error is the most reliable way of handling it; but we're not going to require that (though we could RECOMMEND it, I suppose). It could also choose to run its own validations, but at that point it's up to the client to figure out what those should be - we've already told it that the response cannot be trusted, so if it feels it should push ahead anyway, that's its responsibility to figure out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the conformance requirements need to write a compliant client and server, and the wire format that they use between them.

Fair enough 👍

Clients have to live in the real world where any arbitrary intermediary can intercept the result and return some nonsense

Same is true for every networking protocol. HTTP doesn't warn about a TCP router swapping bytes around. TCP doesn't warn about an ethernet card rewriting bytes, etc... It's a given.

We cannot enforce any rules on HTTP

We kind of do though? What I understand from the spec is that we're "allocating the graphql-response+json media type": if HTTP returns a graphql-response+json response then it MUST be a valid GraphQL response. I think we expect that from the HTTP layer? It's a given.

I very much feel that this particular rule should not be outside of the spec. It's incredibly important, the abundance of "200 is not okay" talks at conferences over the past 9 years I think adds a lot of evidence that this rule in particular is necessary.

Agree that this must be called out in the spec. Just I wouldn't make it normative as it's already contained in the other parts. Maybe a separate paragraph explaining the rationale for graphql-response+json? Basically this note but elevating it to a more preminent position in the document and explaining about client processing there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP doesn't warn about a TCP router swapping bytes around.

As a counter-example, HTTP details a Content-Length header and states:

When a Content-Length is given in a message where a message-body is
allowed, its field value MUST exactly match the number of OCTETs in
the message-body. HTTP/1.1 user agents MUST notify the user when an
invalid length is received and detected.

This effectively protects the client against the connection being terminated (as would be expected by Connection: close) before all the data is received - one of the expected failure modes for HTTP.

Protocol generally have to somewhat trust the underlying protocol (which is why we have not detailed how HTTP works, and why HTTP doesn't detail how TCP works); however they also need to take into account how, even in a trusted network, they may fail - especially when the failure modes can cause ambiguity.

Our failure mode here is that there's an intermediary that bails out the request early (permissions, failed cache, netsplit, timeout, etc etc etc) and returns a response using a common media type application/json. We trust these intermediaries to never do this with a 2xx status code; just as we trust them to not use application/graphql-response+json if they're not speaking GraphQL. But we must convey this important information to the client implementers: if they get a non-2xx and it uses this common media type then they must not trust it to be GraphQL even if they trust the underlying network - this is an expected failure mode.

I see what you mean about it effectively being covered by other parts of the spec, but I'm not convinced that it doesn't bear repeating - what problem are we trying to solve by removing this or demoting it to a non-normative note?

Copy link
Contributor Author

@martinbonnin martinbonnin Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our failure mode here is that there's an intermediary that bails out the request early (permissions, failed cache, netsplit, timeout, etc etc etc) and returns a response using a common media type application/json. We trust these intermediaries to never do this with a 2xx status code; just as we trust them to not use application/graphql-response+json if they're not speaking GraphQL.

I love this paragraph. I would love to see that kind of context added to the spec (permissions, failed cache, netsplit, timeout, etc...trust that if this happens it uses non-2xx, etc..)

they must not trust it to be GraphQL even if they trust the underlying network - this is an expected failure mode.

When I read that as a client developer, "I MUST NOT trust something", I'm not sure what this concretely mean. If I get a 2xx or a graphql-response+json then I know what to do (parse the GraphQL response), if not then 🤷 I must not trust it but how does that concretely materialize? I'm not sure.

what problem are we trying to solve by removing this or demoting it to a non-normative note?

2 things (and one extra):

  1. Guide client implementers by using positive, actionable wording:
// This is very close to what a client would write 
// (and also very close to how this PR is worded)
if (status == 2xx || content-type == "graphql-response+json") { 
  parseResponse() 
}

// I find this harder to read
if (status != 2xx && content-type == "application/json") { 
  // What to do here? I don't know, it depends.
} else {
  // This is actually the important branch
}
  1. Keep the normative parts of the spec DRY by not repeating them (repetition in non-normative notes is ok). Took me a while to process that MUST NOT because I was trying to understand if this was new information or just a repetition.
  2. Be more explicit about the rationale behind the 2 content-types. This is partially adressed in this PR by adding an intro sentence (In certain environments,...). Overall, I feel like graphql-response+json is a very important design choice. There are different notes in to the spec that address this but I found it hard to reconciliate all of them. By adding the intro sentence, I can link client implementers to that section and it's relatively self-contained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What problem are we trying to solve by removing this or demoting it to a non-normative note?

I'll grant that the spec does contain a lot of client specifications, and written as such.

However, I feel the requirement in question simply ** is ** non-normative by nature, and probably not within scope of this document.

Let's compare to CORS issues. Is CORS part of the GraphQL-over-http protocol? No. Is it part of the bigger picture that a client/server should consider when designing a server? Yes. How do we deal with it? In #303 we add a non-normative note that mentions that if CORS issues are not considered, there may be security issues with the implementation. Why isn't it normative? CORS may not be relevant. CORS may be solved in another fashion. We don't require HTTPS either - for example, many microservices run on HTTP internally and the HTTPS layer is applied at the edge. Perhaps we could add a non-normative note saying that data over HTTP is insecure, although the readers probably already know that and have a plan to keep their data secure.

By comparison, here we have an issue where an intermediary server might respond with failure. Could this be relevant? Yes. Is it always relevant? No. Perhaps it is known that no intermediary servers are in use. Perhaps the intermediary servers are configured to return 500 rather than 400 errors. Perhaps they return xml instead of json responses. Ultimately, I don't think this is part of the GraphQL-over-http protocol. A note is warranted as a reminder to implementers.

Now if it's deemed important enough to add a normative note, then it should be worded as an optional requirement with specific behavior, such as:

  • A client SHOULD ignore any non-200 response that has an application/json content type, treating the response as a network error.

This tells the client exactly what to do, but as an optional requirement. (just an example; it's worded poorly)

When I read that as a client developer, "I MUST NOT trust something", I'm not sure what this concretely mean. If I get a 2xx or a graphql-response+json then I know what to do (parse the GraphQL response), if not then 🤷 I must not trust it but how does that concretely materialize? I'm not sure.

Completely agree.

Guide client implementers by using positive, actionable wording

Agree

Keep the normative parts of the spec DRY by not repeating them

Agree

Be more explicit about the rationale behind the 2 content-types. This is partially adressed in this PR by adding an intro sentence (In certain environments,...). Overall, I feel like graphql-response+json is a very important design choice. There are different notes in to the spec that address this but I found it hard to reconciliate all of them. By adding the intro sentence, I can link client implementers to that section and it's relatively self-contained.

Yes, I think this PR does a better job communicating with the reader the logic/rationale of the altered content type and how to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this concretely mean

I think "do not"'s are okay in specs - especially when you're telling someone to not do something that they typically would do unless told otherwise. But yes; we currently leave them to fend for themselves in this regard because there is no "correct" way to handle this, though there is an "incorrect" way, which is to blindly trust it.

A client SHOULD ignore any non-200 response that has an application/json content type, treating the response as a network error.

I'm happy changing to wording along these lines - it aligns with what I said before: "throwing an error is the most reliable way of handling it; but we're not going to require that (though we could RECOMMEND it, I suppose)" - SHOULD and RECOMMEND have the same normative meaning.

I still think that we should pair these together though: you MUST NOT blindly trust it, and we RECOMMEND that you (i.e. "you SHOULD" - same difference) raise an exception as if it were a network error.

Keep the normative parts of the spec DRY by not repeating them

I agree in general, but this is a conformance requirement specific to application/json so there should be a constraint in the application/json section - is that actually repeated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants