There are very few scenarios in which an eventually consistent database is preferable over a strongly consistent database. Further, in a multi-region application scenario where scaling is necessary, choosing either an undistributed database or an eventually consistent database is even more questionable. So what motivates engineers to ignore strongly consistent distributed databases? We have seen many reasons, but wrong assumptions drive most of them.
As we explained in Part 1 of this series, the CAP theorem is widely accepted yet often misinterpreted. When many people misinterpret a well-known theorem, it leaves a mark. In this case, many engineers still believe that eventual consistency is a necessary evil.
It is slowly sinking in that consistency should not be sacrificed, yet many databases still put consistency second. Why is that? Some popular databases offer options that deliver higher consistency, but only at the cost of potentially very high latencies. Their sales messaging might even claim that delivering consistency at low latencies in a multi-region distributed database is incredibly hard or even impossible, and the developer audience has salient memories of experiencing very poor latencies in databases that were not built for consistency. Combined, they jointly fortify the misconception that strong consistency in a distributed database with relatively low latencies is impossible.
Many engineers build according to the “Premature optimization is the root of all evil” (Donald Knuth) principle, but that statement is only meant to apply to small inefficiencies. Building your startup on a strongly consistent distributed scalable database might seem like a premature optimization, because initially, your application doesn’t require scale and might not require distribution. However, we are not talking about small inefficiencies here. The requirement to scale or distribute might arise overnight when your application becomes popular. At that point, your users have a terrible experience, and you are looking at a substantial challenge to change your infrastructure and code.
This used to have some truth to it since distributed databases were new, and many came with severe limitations. They did not allow joins, only allowed key-value storage, or required you to query your data according to predefined sharding keys, which you couldn’t change any more. Today, we have distributed databases that have flexible models and provide the flexibility you are used to with traditional databases. This point is very related to the previous point, which ignores that nowadays, starting to programming against a strongly consistent distributed database is just as easy and probably easier in the long run compared to a traditional database. If it’s just as easy, then why not optimize from the start?
Distributed databases are often created by people who have experienced problems with eventual consistency. For example, FaunaDB was built by former Twitter engineers after having experienced how difficult it is to build a scalable system on top of the eventually consistent databases that were popular around that time, such as Cassandra. These problems typically manifest when a new company starts to scale, hence many younger engineers have never experienced them first hand.
Sometimes painful things can teach us lessons that we didn’t think we needed to know.— Amy Poehler
Discussing the dangers of eventual consistency typically leads to the “it works for me” argument from engineers who simply haven’t experienced any issues yet. Since that often takes months (or years, if you are lucky), let’s look at an analogy.
A while ago, my best friend was about to miss an appointment, so I lent him my bike. I was happy that I helped out, he was happy, and everything went well. That happiness quickly turned into pain when he tried to jump the bike onto a side-walk. You see… I had tinkered with the bike earlier that day and had forgotten to tighten the front wheel. He came back with a huge purple bruise.
The bike example is very similar to working with a database that is not strongly consistent. Everything will go well until you try to lift the bike’s wheel (or in other words, until your company lifts off and starts scaling up).
At the moment your application needs to scale up, you typically do so by replicating services. Once the database becomes the bottleneck, you replicate your traditional database or move to a distributed database. Sadly, at that point, features in your application might break when you start replicating your database. Until now, you hadn’t noticed these problems since the database ran on a single node. At that point, two things might happen:
- Situation 1, build around it/fix it: the developers soon realize that the database they are ‘riding’ is unreliable for the features they have built or are trying to build. Their choices come down to canceling the features, simplifying the features, or changing the database.
- Situation 2, fail epically: the developers were not well informed by the vendor (I was a lousy bike vendor to my friend) about the risks, and now lack the information to understand the very subtle implications of what’s happening. This is not necessarily due to a lack of capability of the engineer. Poorly defined standards and optimistic marketing do a great job of obfuscating different databases’ consistency guarantees.
The developers who end up in the first situation are often already experienced in dealing with eventually consistent systems. They will now either accept that they can’t deliver on some features, or will build a complex and hard-to-maintain layer on top of the database to get what they need. In essence, they attempt to develop a strongly consistent database on top of an eventually consistent one. That’s a shame since other people have designed distributed databases from the ground up that will not only be more efficient, but don’t require maintenance from your development team!
The developers who end up in the second situation are riding a partly invisible bike. They do not realize that the wheel is loose, do not see the wheel detach, and once they look up after falling, they still see a completely intact bike.
At the moment things go wrong, the complexity to resolve these bugs is high for several reasons:
- Determine whether it’s an eventual consistency bug. The issue might be either an application bug, or a bug caused by misunderstanding the guarantees of the underlying database. To know for sure, we need to investigate the application logic, and in case the application logic is sound in a non-distributed environment, the engineer has to have the instinct to evaluate whether this situation might arise due to eventual consistency.
- The cause has disappeared. Second, since the database eventually becomes consistent, the cause of the problem has probably disappeared (the wheel is magically reattached to the bike, and all you see is an impeccable bike).
- Fix it! Once the problem is determined, you can either find a way around it, attempt to build a layer on top of the database (hello latency and other potential bugs), remove the features, or change the database. The last option is sometimes perceived as easy. However, even the most subtle differences between databases make this a very challenging endeavor. At the moment your application is lifting off, you already have your hands full. This is not the moment you want to be swapping databases!
The invisible bike example is still too forgiving. In reality, others are probably depending on your application. So basically, you are riding an invisible bike while others (your clients) are standing on your shoulders.
Not only will you fall, but they will fall with you and land on top of you–heavily and painfully. You might not even survive the fall at that point; in other words, your company might not survive the storm of negative feedback from your clients.
The moral of the story? If you had chosen a strongly (vs.eventually) consistent database from the beginning, you would not have to consider going through a complex and resource-intensive project like migrating your database at a point when your clients are already frustrated.
Choosing an eventually consistent database for scaling was justified a few years back when there was simply no other choice. However, we now have modern databases that can scale efficiently without sacrificing data consistency or performance. . Moreover, these modern databases also include several other awesome features that go beyond consistency, such as ease of use, serverless pricing models, built-in authentication, temporality, native GraphQL, and more. With a modern database, you can scale without opening Pandora’s box!
And, if after reading this series of articles, you still choose not to use a strongly consistent distributed database, please at least make sure to tighten your wheels (in other words, read and understand different databases’ consistency guarantees).