GraphQL Development Best Practices


Source: Unsplash

Discover GraphQL best practices to make your GraphQL development process more efficient and deliver better product.

GraphQL on its own is already a powerful tool with a client-centric approach. This article lists some tips on how to make the development process more efficient and your product better.

More generally, these are the pieces of advice that you should take even before you get your hands on the code. Keep them always in mind as they are going to be helpful also during the development process:

  1. Share your design early: show how your API can be implemented as soon as possible, preferably providing mock servers for testing. This can anticipate any changes and concerns that might otherwise arise only after the delivery.
  2. Be minimalist: avoid adding features that might never be used by the client. This will prevent future issues with design, performance, security, and deprecations.
  3. Be language agnostic: users of your API usually don’t care about structural details such as what database or programming language is being used. However, when choosing them, be sure to make decisions focused on the client.
  4. Be consistent and symmetric: when naming your schema members, try to use the same logic and terms at all times and be very specific. The more obvious, the better: users will expect counterpart actions to be a variation of the same term, not a synonym (e.g. doAction -> undoAction instead of deleteAction).
  5. Be clear: GraphQL offers an expressive type system and using that to your advantage makes your API more understandable. Descriptions added into the schema save users’ time going after external sources, but the aim is to write code that speaks for itself.

One way to keep things this simple is using the Anemic GraphQL, a pattern popularized by Martin Fowler and made available through the Anemic Domain Model. This is a design strategy that aims for writing schemas that are simple and direct, not worried about use cases or functionalities. But, in general, there are a few things that you can keep in mind when creating your schema, such as:

  • Avoiding generic fields and runtime logic when the schema can enforce it.
  • Using complex object and input types to couple fields and arguments, but be mindful of impossible states.
  • Using default values in optional inputs and arguments to show what is the default behavior.

Now, when it comes to the actual strategy for schema design, we need to consider what client we will be using: Apollo or Relay. While the first is a community-driven effort to build a flexible GraphQL client for all major platforms, Relay was created by Facebook and has a few advantages when it comes to performance optimization. However, it is only available on the web.

Each client has its own “style” when it comes to schema design. While Apollo is more flexible, the Relay library is structured around the assumptions that the application you want to create is going to re-fetch objects using a global identifier, connect concepts that help paginate through datasets, and follow a specific structure for mutations. For that, it is expected that you will go through these components at some point during the development process:

Naming Conventions

One of the things that can come in handy when building a GraphQL schema is investing in consistent naming. This is also valid for mutations which can have explicit and readable names, or they can also carry a tag directive that helps with grouping and documenting the code.

As mentioned before, a good rule of thumb is to name objects literally and opt for antonyms instead of synonyms when configuring the opposite function (uncreate instead of delete). It is also a good idea to get inspiration from similar projects and see what terms are more widely used in the industry, so you are more likely to be speaking “the same language” other developers are using when writing GraphQL.

Pagination

Often an essential component of a good API, pagination aims to break up large datasets into “pages” so that the client gets only what they need. This approach is beneficial because:

  • On the server-side, it helps loading a certain part of the dataset instead of making queries for too much data, which will often lead to slow requests and timeouts.
  • On the client-side, it offers better user experience and performance as it is a more targeted fetching method.

In practice, Pagination can be done in various ways, such as:

Offset Pagination

  • Pros: One of the most widely used techniques, it allows the client to tell how many items they want to receive and includes an offset or page parameter that helps them move across the paginated list.
  • Cons: This approach often doesn’t scale well for very big datasets as it can start returning inconsistent results (e.g. repeated values).

Cursor Pagination

  • Pros: Increasingly more popular due to Relay’s connection pattern, this method uses a stable identifier that points to an item on the list, so that clients can instruct the API to give them a number of results. It is often a good choice for performance and accuracy.
  • Cons: The concept of “pages” does not exist in this technique. The client only knows what is “next” or “previous” to some item and not how many pages there are. Thus, it is not possible to skip ahead to any specific page.

Relay Connections

  • Pros: Strongly based on cursor pagination, it calls “connections” that will return “connection types” consisting of two fields: pageInfo for metadata and edges for extra connection metadata. This way the client doesn’t need to fetch all items, but identify which “batches” should be unpacked.
  • Cons: Relay does require cursor-based pagination, so you won’t be able to use an offset approach. However, the edge types are useful enough to compensate for that.

Global Identification

Another way to organize components, besides Pagination, is by using a global identification. Originally proposed on Relay and similar to URIs, this method has become a more general good practice though it is not considered mandatory — especially if you are not planning on supporting Relay in your application.

It is supposed to make the GraphQL server expose a global node (id:ID!): Node field that allows clients to fetch any node through that single field, thus creating a consistent object access pattern that simplifies caching and lookups.

While GraphQL clients often build complex normalized caches to store nodes previously fetched, Relay needs a mechanism to re-fetch single nodes. With global identification, this becomes easier, especially if you opt for opaque IDs. Just make sure that your global ID has enough context to globally route to a node, particularly in the case of a distributed architecture.

Nullability

When it comes to setting up fields, it is best practice to consider that all fields are nullable by default, but they can be defined whether to return null when queried or not. For instance, it could be a problem to have a field marked as non-null returns null, for this causes an error in the GraphQL server which will escalate into a search for parents that are nullable. In case all fields are non-nullable, then the entire query will return null with an error.

While this is a useful strategy to build more expressive and predictable schemas, since it allows clients to avoid overly defensive code and conditionals, there are some flipsides. One of them is that non-null fields and arguments are harder to evolve, which means going from non-null to null is a breaking change, whereas the opposite is not.

Another point is that it is hard to predict what can be null or not, especially in distributed environments. As your architecture evolves, anything from timeouts, rate limits, or transient errors might return null for certain fields. However, the following scenarios are frequent enough to be used as guidance:

  • In arguments, non-nullis often the best option when developing a more predictable and understandable API.
  • Fields that return object types backed by database associations, network calls, or anything that could possibly lead to a failure are almost always nullable.
  • Simple scalars on an object that you have already loaded on the parent at execution time are usually non-null.
  • Object types that you think that will never be null and still allow for partial responses at the parent level are indeed rarely null, but this is really one of those cases that are hard to predict.

Abstract Types

Another function in GraphQL is the use of abstract types. They are helpful to decouple the interface from the underlying persistence layer and can be of two kinds: union and interface. The first one should be used when a field can return different types which do not necessarily share a common behavior. Interface types, on the other hand, provide a common contract for fields that share behaviors.

Despite being easier to implement, avoid using interface types too often. Sometimes, developers may choose to use interface types when two or more objects share similar fields, but they should not share a common interface. Instead, look at the behavior of the concepts used in the schema besides the attributes or data and use the tools such as helper functions, composition, or inheritance to avoid the overuse of interface types.

In addition to that, developers should consider whether abstract types are really the best option when it comes to the way an API is going to evolve over time — that is, if changes in the code would cause breaking changes or not. One way to avoid such issues is through Liskov’s substitution principle, which suggests an object-oriented programming approach that gives the option for objects to be replaced by sub-objects without breaking the program.

Static Queries

Despite SDKs and query builders being a tempting strategy, writing GraphQL explicitly and directly can be more helpful for people to see what data is being asked for. In fact, GraphQL is a better choice over query builders and, because of that, you should be aiming for keeping your queries static too.

In other words, opt for writing queries that do not change based on variables, conditions or states of the program. Aim for a situation in which someone checks the source code and easily sees what the server will receive. Not only does this approach help with the creation of static queries, but it makes your code more transparent too.

One more advantage is that static queries enable great tooling on the client-side (e.g. IDE support, code generation, linting) and they can be saved on the server. Static queries also make sure that there is a standard and specific language to interact with GraphQL servers, thus making the process language-agnostic.

Finally, for better results, prefer static queries that are also plural in most fields and that offer a way to fetch single entities, if necessary.

Mutations

Possibly one of the biggest struggles for people learning GraphQL, mutations are actually simple fields that can take arguments and return a certain type. However, that type will not be simple as it is in Relay, but rather what is called “payload types”. They are specifically used as the result of mutations.

Still, similarly to Relay, GraphQL also follows the convention of using a single, unique, and required input type for each mutation. This way, the input will be more evolvable and the mutation can use a single variable on the client side. Such an approach consequently makes things a bit easier when you start to have increasingly more arguments. The trick is choosing whether to use a more fine-grained or coarse-grained approach to mutations, as we will see in the next section.

Fine-Grained vs Coarse-Grained Mutations

Fine-grained mutations can be helpful to avoid failures when the client, for instance, wants to add a product and then modify its price later on. If they can change only the price field, then this can prevent further inconsistencies in the system. However, making calls can be expensive and in many cases when fine-grained mutations are used, you will still need to refresh other fields. Thus, in some cases, opting for coarse-grained mutations can be more advantageous.

Another suggestion is to consider making mutations or transactions through batches, so you can change multiple fields at once, only using a specific ID. The problem is, you need to make sure you have named things properly or use the extensions key to avoid conflicts and thus prompt errors.

Error Messages and Notifications

Speaking of errors, you may already know that not all errors need to be a string, but that could make interpretation more difficult depending on who is going to interact with that notification. One way to tackle this is by first dividing errors in broad categories, such as:

  • Developer/Client errors: something that went wrong during the query (wrong ID format, time out etc). Such errors are often addressed by the developer of the application.
  • User errors: the user or client did something wrong (adding an already used email to the form, paying the same order twice etc) and the error is connected to the functionality of the application itself.

Now, when it comes to user-facing errors, the easiest way to set them up is by adding a field that describes it, for instance a userErrors field that will hold a string:

Source: Production-ready GraphQL, by Marc-André Giroux

You can also opt for union types to represent an error and to prompt suggestions to users, though this is a more expensive choice. For example, in case a user picks an already taken username, union types can be set to prompt variations of that username:

Source: Production-ready GraphQL, by Marc-André Giroux

When it comes to notifications, these can appear, for instance, when you have results which cannot be returned right away but demand that other processes are run in the background. In such situations, a common practice is to throw the 202 Accepted status code, but with GraphQL this gets trickier since it might be that only parts of the request are asynchronous instead.

In payment processes, for example, a union type could notify the user that the request is pending, but a more generic way to tackle this is by making the process a job that is identifiable through a global ID and contains only two fields: a “done” boolean type that indicates the status of the asynchronous job, and a query field which returns the query root type so that clients can query the new state of things after the job is completed.

Security

This is one of the hot topics in GraphQL because, at first, the idea that clients can query whatever data they need from the server could sound like accessing even the data you’re not supposed to. In fact, GraphQL only exposes what was set to be exposed. Still, there are a few actions that can be taken to fortify your application and make it even safer.

Rate Limiting

Like any other web API, setting limits is a good strategy to avoid, for example, an overload of requests per minute. There are a few ways this can be done in GraphQL:

  1. Complexity: limit how many requests per minute should be allowed by making an estimated calculation of the “cost” of queries. If your code uses pagination, you can use its argument to make this calculation. If not, then you can try to rate limits based on the number of objects that are returned in response to a query.
  2. Time: similar to the previous approach, but using time as a variable. That is, setting limits based on how long a request takes to be fulfilled. This can be a little bit trickier than the complexity approach because there your algorithm can compute the processing cost for you, whereas with time clients need to try out queries and see how long they take. In any case, both options are better than simply counting the number of requests.
  3. Exposure: oftentimes, it is hard for clients to know whether they are within limits or not, and for this reason many API providers communicate or expose the rate limit status to help with integration. The most common way to do this is through response headers such as RateLimit-Limit which indicates how many requests a user can make within a margin (for example, time). Other headers such as RateLimit-Remaining and RateLimit-Reset are also helpful to tell the status.
  4. Limitations: exposing the limits of your API can be informative, but also tricky as this information could encourage clients to “game” the system, for instance by making an exact number of requests per hour to always stay under the limit. Though this is not inherently bad, it is hard to change it later as this approach assumes reliability and consistency, whereas sometimes requests can be routed to different data servers. So another way to avoid that is actually by blocking abusive queries.
  5. Blocking: a more active approach, this aims for limiting the maximum complexity or the node limit of queries to be sent to the server, even if the user sends big queries at a slow rate.
  6. Timeout: when the previous approach fails or when you want to make sure users won’t find a gap in your code, setting timeouts is a way to safeguard your application. The key here is to find a max complexity and/or node limit that would be blocking queries before requests are timed out, and that’s the hard part. You can find this measurement through data analysis, good monitoring, and trial and error though.

Authentication and Authorization

Often interchanged in their meaning, authentication is the act of determining who a user is and whether they are logged in or not. Authorization, on the other hand, is the act of determining if a user is allowed to do an action or see something.

One of the main questions in the GraphQL community is whether authentication should be tackled within the GraphQL server and schemas, since, in practice, login and logout mutations should be in the code. As a best practice though, it is advisable to leave authentication concerns out of the schema and only expect that a user or other sessions be present when used in a query.

Additionally, resolvers should not be aware of HTTP headers or tokens, so that you can opt for different authentication methods without changing your GraphQL schema — thus keeping it easier to interact with and more stateless. A small downside though is that clients may need to authenticate using other resources than GraphQL, but the technology really does not offer much advantages when it comes to simple HTTP requests using a login mutation that fetches a single token.

Now, when it comes to authorization, this is a more complex topic not only in the GraphQL context but in general. But in spite of being counterintuitive, it is advisable that we do not have all authorization logic of our application at the GraphQL level because it is often just one of the several ways to access domain logic. If we opt for that though, we need to make sure the same rules apply to everything else and that they are maintained every time the API evolves.

While GraphQL offers authorization scopes such as OAuth, for example, other authorization scenarios may create conflicts when it comes to authentication settings (e.g. not being able to perform an action because you’re not an admin). Though API scopes are reasonably implemented with GraphQL, business rules that relate to the domain should be as out as possible. To achieve that, you can keep these general pieces of advice in mind:

  1. Prioritize object authorization over field authorization: start with authorization rules that apply to object types rather than fields as they usually translate well to API scopes and simple scalar fields often share the same set of required permissions. While it’s hard to keep track of all possible ways to get to an object, making authorization checks only at the field level can leave your application open to unpredicted access patterns.
  2. Use the GraphQL-Shield library for increased complexity of permissions: but remember that your focus should be on API permissions like API scopes rather than business rules. Enforce these rules on a per-type basis first before opting for finer-grained permissions.
  3. Don’t leak the existence of something: there is a subtle but important difference between prompting a message such as “this object does not exist” and “you cannot access this object”. Leaking this information can be a security risk, so the advice is to simply return null instead of an error or a string.

Blocking Introspection

Even though introspections could be one of the reasons why GraphQL is preferred as a language, there is a security risk in leaving introspection capabilities in the server. There are a few scenarios where hiding or limiting introspection might be necessary:

  • Leaking future releases: if your application is public and accessible through a browser, for example, you might want to limit introspection to make sure upcoming features are not visible in the code.
  • Whitelisting actions: in case your API is internal, you can rather set what queries can be executed so that the user does not perform any requests besides those set by the client.

Introspection is something that is mostly relevant to developers and engineers, thus supposedly only enabled in development and not included in production. But, for public GraphQL APIs, it does not make sense to try to hide introspection when the schema is what users want to access. Making this more difficult (as in, security for obscurity) would rather prompt a negative reaction than making your application more secure.

Persisted Queries

A persisted query is a query string that is cached on the server side, along with its unique identifier. With it, instead of sending the full query document every time there is a request, the client starts by registering queries with the server before it is even sent. Mostly used with internal APIs, they might eventually become useful for public applications too.

This can happen before or during the deploy process or, in some cases, the first query from a client is used as “registration”. Once the identifier of a particular query is registered, you can use it with any variables but without the need to query the full document.

You can use Apollo or URQL to find libraries around persisted queries. The advantages of using them include:

  • Save bandwidth as clients will never need to send the full query string anymore.
  • Optimize queries as the server can already do a pre-parsing, pre-validation and pre-analysis of them before they are even sent.
  • Secure your API by using persisted queries in your whitelist strategy, thus essentially blocking access to all other queries that are not registered.

Conclusion

This concludes our piece on the best practices for developing GraphQL APIs. There are more aspects to consider, like selecting a database that is a good match for your schema. For example, Neo4j is a great complement to GraphQL APIs, taking full advantage of the architecture of graph databases to expand the opportunities for developers to design optimal APIs.

For further reading about GraphQL we recommend resources such as:


GraphQL Development Best Practices was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.