How to Make GraphQL and DynamoDB Play Nicely Together

Serverless, GraphQL, and DynamoDB are a powerful combination for building websites. The first two are well-loved, but DynamoDB is often misunderstood or actively avoided. It’s often dismissed by folks who consider it only worth the effort “at scale.”

That was my assumption, too, and I tried to stick with a SQL database for my serverless apps. But after learning and using DynamoDB, I see the benefits of it for projects of any scale.

To show you what I mean, let’s build an API from start to finish — without any heavy Object Relational Mapper (ORM) or GraphQL framework to hide what is really going on. Maybe when we’re done you might consider giving DynamoDB a second look. I think it is worth the effort.

The main objections to DynamoDB and GraphQL

The main objection to DynamoDB is that it is hard to learn, but few people argue about its power. I agree the learning curve feels very steep. But SQL databases are not the best fit with serverless applications. Where do you stand up that SQL database? How do you manage connections to it? These things just don’t mesh with the serverless model very well. DynamoDB is serverless-friendly by design. You are trading the up-front pain of learning something hard to save yourself from future pain. Future pain that only grows if your application grows.

The case against using GraphQL with DynamoDB is a little more nuanced. GraphQL seems to fit well with relational databases partly because it is assumed by a lot of the documentation, tutorials, and examples. Alex Debrie is a DynamoDB expert who wrote The DynamoDB Book which is a great resource to deeply learn it. Even he recommends against using the two together, mostly because of the way that GraphQL resolvers are often written as sequential independent database calls that can result in excessive database reads.

Another potential problem is that DynamoDB works best when you know your access patterns beforehand. One of the strengths of GraphQL is that it can handle arbitrary queries more easily by design than REST. This is more of a problem with a public API where users can write arbitrary queries. In reality, GraphQL is often used for private APIs where you control both the client and the server. In this case, you know and can control the queries you run. With a GraphQL API it is possible to write queries that clobber any database without taking steps to avoid them.

A basic data model

For this example API, we will model an organization with teams, users, and certifications. The entity relational diagram is shown below. Each team has many users and each user can have many certifications.

Relational database model

Our end goal is to model this data in a DynamoDB table, but if we did model it in a SQL database, it would look like the following diagram:

To represent the many-to-many relationship of users to certifications, we add an intermediate table called “Credential.” The only unique attribute on this table is the expiration date. There would be other attributes for each of the tables, but we reduce it to just a name for each for simplicity.

Access patterns

The key to designing a data model for DynamoDB is to know your access patterns up front. In a relational database you start with normalized data and perform joins across the data to access it. DynamoDB does not have joins, so we build a data model that matches how we intend to access it. This is an iterative process. The goal is to identify the most frequent patterns to start. Most of these will directly map to a GraphQL query, but some may be only used internally to the back end to authenticate or check permissions, etc. An access pattern that is rarely used, like a check run once a week by an administrator, does not need to be designed. Something very inefficient (like a table scan) can handle these queries.

Most frequently accessed:

User by ID or name
Team by ID or name
Certification by ID or name

Frequently accessed:

All Users on a Team by Team ID
All Certifications for a given User
All Teams
All Certifications

Rarely accessed

All Certifications of users on a Team
All Users who have a Certification
All Users who have a Certification on a Team

DynamoDB single table design

DynamoDB does not have joins and you can only query based on the primary key or predefined indexes. There is no set schema for items imposed by the database, so many different types of items can be stored in a single table. In fact, the recommended best practice for your data schema is to store all items in a single table so that you can access related items together with a single query. Below is a single table model representing our data. To design this schema, you take the access patterns above and choose attributes for the keys and indexes that match.

The primary key here is a composite of the partition/hash key (pk) and the sort key (sk). To retrieve an item in DynamoDB, you must specify the partition key exactly and either a single value or a range of values for the sort key. This allows you to retrieve more than one item if they share a partition key. The indexes here are shown as gsi1pk, gsi1sk, etc. These generic attribute names are used for the indexes (i.e. gsi1pk) so that the same index can be used to access different types of items with different access pattern. With a composite key, the sort key cannot be empty, so we use “#” as a placeholder when the sort key is not needed.

Access pattern	Query conditions
Team, User, or Certification by ID	Primary Key, pk=”T#”+ID, sk=”#”
Team, User, or Certification by name	Index GSI 1, gsi1pk=type, gsi1sk=name
All Teams, Users, or Certifications	Index GSI 1, gsi1pk=type
All Users on a Team by ID	Index GSI 2, gsi2pk=”T#”+teamID
All Certifications for a User by ID	Primary Key, pk=”U#”+userID, sk=”C#”+certID
All Users with a Certification by ID	Index GSI 1, gsi1pk=”C#”+certID, gsi1sk=”U#”+userID

Database schema

We enforce the “database schema” in the application. The DynamoDB API is powerful, but also verbose and complicated. Many people jump directly to using an ORM to simplify it. Here, we will directly access the database using the helper functions below to create the schema for the Team item.

const DB_MAP = {
  TEAM: {
    get: ({ teamId }) => ({
      pk: 'T#'+teamId,
      sk: '#',
    }),
    put: ({ teamId, teamName }) => ({
      pk: 'T#'+teamId,
      sk: '#',
      gsi1pk: 'Team',
      gsi1sk: teamName,
      _tp: 'Team',
      tn: teamName,
    }),
    parse: ({ pk, tn, _tp }) => {
      if (_tp === 'Team') {
        return {
          id: pk.slice(2),
          name: tn,
          };
        } else return null;
        },
    queryByName: ({ teamName }) => ({
      IndexName: 'gsi1pk-gsi1sk-index',
      ExpressionAttributeNames: { '#p': 'gsi1pk', '#s': 'gsi1sk' },
      KeyConditionExpression: '#p = :p AND #s = :s',
      ExpressionAttributeValues: { ':p': 'Team', ':s': teamName },
      ScanIndexForward: true,
    }),
    queryAll: {
      IndexName: 'gsi1pk-gsi1sk-index',
      ExpressionAttributeNames: { '#p': 'gsi1pk' },
      KeyConditionExpression: '#p = :p ',
      ExpressionAttributeValues: { ':p': 'Team' },
      ScanIndexForward: true,
    },
  },
  parseList: (list, type) => {
    if (Array.isArray(list)) {
      return list.map(i => DB_MAP[type].parse(i));
    }
    if (Array.isArray(list.Items)) {
      return list.Items.map(i => DB_MAP[type].parse(i));
    }
  },
};

To put a new team item in the database you call:

DB_MAP.TEAM.put({teamId:"t_01",teamName:"North Team"})

This forms the index and key values that are passed to the database API. The parse method takes an item from the database and translates it back to the application model.

GraphQL schema

type Team {
  id: ID!
  name: String
  members: [User]
}
type User {
  id: ID!
  name: String
  team: Team
  credentials: [Credential]
}
type Certification {
  id: ID!
  name: String
}
type Credential {
  id: ID!
  user: User
  certification: Certification
  expiration: String
}
type Query {
  team(id: ID!): Team
  teamByName(name: String!): [Team]
  user(id: ID!): User
  userByName(name: String!): [User]
  certification(id: ID!): Certification
  certificationByName(name: String!): [Certification]
  allTeams: [Team]
  allCertifications: [Certification]
  allUsers: [User]
}

Bridging the gap between GraphQL and DynamoDB with resolvers

Resolvers are where a GraphQL query is executed. You can get a long way in GraphQL without ever writing a resolver. But to build our API, we’ll need to write some. For each query in the GraphQL schema above there is a root resolver below (only the team resolvers are shown here). This root resolver returns either a promise or an object with part of the query results.

If the query returns a Team type as the result, then execution is passed down to the Team type resolver. That resolver has a function for each of the values in a Team. If there is no resolver for a given value (i.e. id), it will look to see if the root resolver already passed it down.

A query takes four arguments. The first, called root or parent, is an object passed down from the resolver above with any partial results. The second, called args, contains the arguments passed to the query. The third, called context, can contain anything the application needs to resolve the query. In this case, we add a reference for the database to the context. The final argument, called info, is not used here. It contains more details about the query (like an abstract syntax tree).

In the resolvers below, ctx.db.singletable is the reference to the DynamoDB table that contains all the data. The get and query methods directly execute against the database and the DB_MAP.TEAM.... translates the schema to the database using the helper functions we wrote earlier. The parse method translates the data back to the from needed for the GraphQL schema.

const resolverMap = {
  Query: {
    team: (root, args, ctx, info) => {
      return ctx.db.singletable.get(DB_MAP.TEAM.get({ teamId: args.id }))
        .then(data => DB_MAP.TEAM.parse(data));
    },
    teamByName: (root, args, ctx, info) =>; {
      return ctx.db.singletable
        .query(DB_MAP.TEAM.queryByName({ teamName: args.name }))
        .then(data => DB_MAP.parseList(data, 'TEAM'));
    },
    allTeams: (root, args, ctx, info) => {
      return ctx.db.singletable.query(DB_MAP.TEAM.queryAll)
        .then(data => DB_MAP.parseList(data, 'TEAM'));
    },
  },
  Team: {
    name: (root, _, ctx) => {
      if (root.name) {
        return root.name;
      } else {
        return ctx.db.singletable.get(DB_MAP.TEAM.get({ teamId: root.id }))
          .then(data => DB_MAP.TEAM.parse(data).name);
      }
    },
    members: (root, _, ctx) => {
      return ctx.db.singletable
        .query(DB_MAP.USER.queryByTeamId({ teamId: root.id }))
        .then(data => DB_MAP.parseList(data, 'USER'));
    },
  },
  User: {
    name: (root, _, ctx) => {
      if (root.name) {
        return root.name;
      } else {
        return ctx.db.singletable.get(DB_MAP.USER.get({ userId: root.id }))
          .then(data => DB_MAP.USER.parse(data).name);
      }
    },
    credentials: (root, _, ctx) => {
      return ctx.db.singletable
        .query(DB_MAP.CREDENTIAL.queryByUserId({ userId: root.id }))
        .then(data =>DB_MAP.parseList(data, 'CREDENTIAL'));
    },
  },
};

Now let’s follow the execution of the query below. First, the team root resolver reads the team by id and returns id and name. Then the Team type resolver reads all the members of that team. Then the User type resolver is called for each user to get all of their credentials and certifications. If there are five members on the team and each member has five credentials, that results in a total of seven reads for the database. You could argue that is too many. In a SQL database this might be reduced to four database calls. I would argue that the seven DynamoDB reads will be cheaper and faster than the four SQL reads in many cases. But this comes with a big dose of “it depends” on a lot of factors.

query { team( id:"t_01" ){
  id
  name
  members{
    id
    name
    credentials{
      id
      certification{
        id
        name
      }
    }
  }
}}

Over-fetching and the N+1 problem

Optimizing a GraphQL API involves balancing a whole lot of tradeoffs that we won’t get into here. But two that weigh heavily in the decision of DynamoDB versus SQL are over-fetching and the N+1 problem. In many ways, these are opposite sides of the same coin. Over-fetching is when a resolver requests more data from the database than it needs to respond to the query. This often happens when you try to make one call to the database in the root resolver or a type resolver (e.g., members in the Team type resolver above) to get as much of the data as you can. If the query did not request the name attribute, it can be seen as wasted effort.

The N+1 problem is almost the opposite. If all the reads are pushed down to the lowest level resolver, then the team root resolver and the members resolver (for Team type) would make only a minimal or no request to the database. They would just pass the IDs down to the Team type and User type resolver. In this case, instead of members making one call to get all five members, it would push down to User to make five separate reads. This would result in potentially 36 or more separate reads for the query above. In practice, this does not happen because an optimized server would use something like the DataLoader library that acts as a middleware to intercept those 36 calls and batch them into probably only four calls to the database. These smaller atomic read requests are needed so that the DataLoader (or similar tool) can efficiently batch them into fewer reads.

So, to optimize a GraphQL API with SQL, it is usually best to have small resolvers at the lowest levels and use something like DataLoader to optimize them. But for a DynamoDB API it is better to have “smarter” resolvers higher up that better match the access patterns your single table database it written for. The over-fetching that results in this case is usually the lesser of the two evils.

Deploy this example in 60 seconds

GitHub Repo

This is where you realize the full payoff of using DynamoDB together with serverless GraphQL. I built this example with Architect. It is an open-source tool to build serverless apps on AWS without most of the headaches of directly using AWS. Once you clone the repo and run npm install, you can launch the app for local development (including a built-in local version of the database) with a single command. Not only that, you can also deploy it straight to production infrastructure (including DynamoDB) on AWS with a single command when you are ready.

Jacob Dubail

# January 16, 2021

I’m very interested in the single table design approach, but with AWS AppSync there doesn’t seem to be a way to achieve it with the built-in codegen. I can see how this makes sense outside of amplify and AppSync, or using a direct lambda resolver with custom data models, but we can achieve all of these complex data access patterns directly in our GraphQL schema using AppSync. Or am I missing something?

Great write up of DynamoDB. I am still a beginner with it, so this was very helpful.

Ryan Bethel

Permalink to comment# January 18, 2021

AppSync handles a lot of the interaction with DynamoDB for you. This is written to show how you might build a model from scratch without the help of something like amplify or AppSync. If you are in AppSync it is probably makes more sense to use it, until you hit a problem that is doesn’t solve. Then you might fall back to this pattern.

Jora

# January 18, 2021

Why don’t you choose mongoDb? It covers all functionality of noSql databases.

jdrydn

Permalink to comment# January 22, 2021

Putting aside self-hosted on EC2 or DocumentDB, DynamoDB reads/writes over HTTPS connections making it perfect for a serverless architecture where managing network connections isn’t efficient!

jdrydn

# January 22, 2021

As an alternative to DataLoader: https://github.com/calebmer/graphql-resolve-batch

Dan Keeley

# May 6, 2021

Do you still stand by the arguments above now that rds offers serverless postgres?

cxt

# July 12, 2021

What if I want to do reverse lookup, say, find all team members with a certain certification. Is this still a good approach? I looked through the schema but I don’t see a query for that.

Johan Sebastian Navarro Cano

# November 3, 2021

Could you put how it would look a table with data in so be it in an excel table leaf?. Thanks for your article.