ALLCODE CRAFTENGINEERING MANAGEMENT

System Design Interview Concepts – Eventual Consistency

What is Eventual Consistency?

    Distributed systems will face network partitioning at some point in their life cycle. When network partitioning happens, CAP theorem dictates that if you pick availability, you cannot have true(strong) consistency, but you can still provide "eventual consistency". 

The basic scenario is as follows:

  1. Imagine that your Website data is replicated on multiple servers across different data centers.
  2. Clients around the world can access any server to access the data (usually being routed to the data center closest to them)
  3. A client writes a piece of data to one of the servers, but it does not get copied to the rest of the servers immediately. Instead, the updated server kicks off a bunch of background tasks to update the other servers in the system. 
  4. A client accesses the server with the data, and gets the most recent copy of the data.
  5. However, a different client in another part of the world accesses a different server and gets the old copy. At a later point of time, when all the data propagation tasks started in step 3 has finished, all the clients can get the most updated copy of the data. Now the system has eventually become consistent.

Now we're ready to define Eventual Consistency. According to Wikipedia:

rocket

Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.

Immediate vs Eventual Consistency

    The opposite of eventual consistency is immediate( or strict or strong) consistency. I won't go into details but understanding the basic difference is critical in order to have an intelligent discussion during your system design interview.

    Strict consistency states that for any incoming write operation, once a write is acknowledged to the client, the updated value is visible on read from any replicated node (server) in the system. This effectively means that  all readers are blocked until replication of the new data to all the nodes are complete.

    Strict consistency is illustrated in the figure below, where all replicated nodes have values consistent with the originating node, but are not accessible until the update finishes. In the diagrams below, node A is the originating node and nodes B and C are the replicas.

Strict Consistency

Strict Consistency

In contrast, the figure below represents a system using eventual consistency. In this scenario, all nodes are always available to be read but some nodes may have stale data at a particular point of time. 

Eventual Consistency

Eventual Consistency

Why not use Strict consistency all the time ?

    That is a great question 🙂 Immediate consistency guarantees that the client always sees the latest data and the data is protected as soon as it is written.

However, it has two major problems:

  1. Having a strict consistency guarantee can have detrimental effect on system availability and performance, depending on the scenario.
  2. Some scenarios  may not require strict consistency. See the section below for example where you can apply eventual consistent models in your system design interviews.

Additionally, moving to eventual consistency also simplified development as complicated synchronous code can now be replaced with asynchronous operations.

Eventual Consistency Models in Real Systems

    Let's consider four real life examples of systems using eventual consistency. You should keep these in mind as a model of when to apply eventual consistency in system design interviews.

In fact, if the system design question fits into any of these three buckets, it might be a good idea to clarify upfront with the interviewer if you should design the system for eventual consistency or strict consistency.

Example 1: Photo sharing system like Flicker

Let's consider a photo sharing application like Flicker which stores a copy of the photos in nodes A and B. When an user upload a new photo, it might get uploaded to node A. Another user querying node B for photos will NOT see the new photo uploaded by user A till node A is able to propagate the new photo to node B. However, the new photo does eventually propagate to node B and user B will be able to eventually query for it. Depending on the system, this propagation might take a few seconds to few hours.

Example 2: Message timeline for a social app like Facebook or Twitter

When you post a status message on Facebook, or tweet a message via twitter, it might not be immediately visible to your friends or followers. But eventually, they'll be able to see the status updates/ tweets. 

Example 3: DNS (Domain Name System)

The most popular system that implements eventual consistency is the DNS. DNS servers do not necessarily reflect the latest values but, rather, the values are cached and replicated across many directories over the Internet. It takes a certain amount of time to propagate new changes to all the DNS servers and clients. DNS is highly available and scalable and serves as the backbone of the internet.

Example 4: Adding items to a shopping cart

Let's imagine you're shopping at an online retailer and the datacenter fails right after you added an item in the cart. In this case, the datacenter fails over to another replica where the event of adding the item to the cart may not have propagated yet. But, it's probably OK (not too annoying) for you to add the item again in the cart again. So in this case eventual consistency will be sufficient.

Support for Eventual Consistency in Modern Databases

Most commercial NoSQL databases offers different consistency levels such that you don't have to choose just between Eventual and Strict consistency. This gives you a lot of flexibility in adapting the database as per your user requirements.

For example, Azure cosmos DB offers five levels of consistency ranging from Strict to Eventual consistency.

Azure Cosmos DB Consistency Levels

Azure Cosmos DB Consistency Levels (Courtsey: MSDN )

Cassandra also extends the concept of eventual consistency by augmenting it with tunable consistency. This allows the client application to decide how consistent the requested data must be for any given read or write operation.

Cassandra also allows you to have separate consistency strategy for read and write operations. For example, for write operations, a consistency level of "Any" means that a write operation must succeed on any available node. This delivers the lowest consistency and highest availability. On the other end of the spectrum, a consistency level "ALL" means that a write operation must succeed on all replica nodes for a row key. This provides the highest consistency and the lowest availability of any other level.

Key Takeaways

Here a list of key things you should remember for system design interviews:

  1. Know the difference between strict and eventual consistency
  2. Understand that NoSQL databases supports a wide spectrum consistency models. You'll need to tune the consistency model of the system based on the user requirements. 
  3. Understand the scenarios where eventual consistency will be useful and design the system as such.

      Finally, a key thing to keep in mind is that a distributed system might have some parts which use eventual consistency and some part which use strict consistency. For example, "up-votes" on a forum post might use eventual consistency whereas password updates for users in the same system might use strict/ immediate consistency.