Disciplined Inconsistency
Brandon Holt, James Bornholt, Irene Zhang, Dan Ports, Mark Oskin, Luis Ceze

Abstract. Distributed applications and web services, such as online stores or social networks, are expected to be scalable, available, responsive, and fault-tolerant. To meet these steep requirements in the face of high round-trip latencies, network partitions, server failures, and load spikes, applications use eventually consistent datastores that allow them to weaken the consistency of some data. However, making this transition is highly error-prone because relaxed consistency models are notoriously difficult to understand and test.

In this work, we propose a new programming model for distributed data that makes consistency properties explicit and uses a type system to enforce consistency safety. With the Inconsistent, Performance-bound, Approximate (IPA) storage system, programmers specify performance targets and correctness requirements as constraints on persistent data structures and handle uncertainty about the result of datastore reads using new consistency types. We implement a prototype of this model in Scala on top of an existing datastore, Cassandra, and use it to make performance/correctness tradeoffs in two applications: a ticket sales service and a Twitter clone. Our evaluation shows that IPA prevents consistency-based programming errors and adapts consistency automatically in response to changing network conditions, performing comparably to weak consistency and 2-10$\times$ faster than strong consistency.

1. Introduction

To provide good user experiences, modern datacenter applications and web services must balance the competing requirements of application correctness and responsiveness. For example, a web store double-charging for purchases or keeping users waiting too long (each additional millisecond of latency [26, 36]) can translate to a loss in traffic and revenue. Worse, programmers must maintain this balance in an unpredictable environment where a black and blue dress [42] or Justin Bieber [38] can change application performance in the blink of an eye.

Recognizing the trade-off between consistency and performance, many existing storage systems support configurable consistency levels that allow programmers to set the consistency of individual operations [4, 11, 34, 58]. These allow programmers to weaken consistency guarantees only for data that is not critical to application correctness, retaining strong consistency for vital data. Some systems further allow adaptable consistency levels at runtime, where guarantees are only weakened when necessary to meet availability or performance requirements (e.g., during a spike in traffic or datacenter failure) [59, 61]. Unfortunately, using these systems correctly is challenging. Programmers can inadvertently update strongly consistent data in the storage system using values read from weakly consistent operations, propagating inconsistency and corrupting stored data. Over time, this undisciplined use of data from weakly consistent operations lowers the consistency of the storage system to its weakest level.

In this paper, we propose a more disciplined approach to inconsistency in the Inconsistent, Performance-bound, Approximate (IPA) storage system. IPA introduces the following concepts:

We implement an IPA prototype based on Scala and Cassandra and show that IPA allows the programmer to trade off performance and consistency, safe in the knowledge that the type system has checked the program for consistency safety. We demonstrate experimentally that these mechanisms allow applications to dynamically adapt correctness and performance to changing conditions with three applications: a simple counter, a Twitter clone based on Retwis [55] and a Ticket sales service modeled after FusionTicket [1].

2. The Case for Consistency Safety

Unpredictable Internet traffic and unexpected failures force modern datacenter applications to trade off consistency for performance. In this section, we demonstrate the pitfalls of doing so in an undisciplined way. As an example, we describe a movie ticketing service, similar to AMC or Fandango. Because ticketing services process financial transactions, they must ensure correctness, which they can do by storing data in a strongly consistent storage system. Unfortunately, providing strong consistency for every storage operation can cause the storage system and application to collapse under high load, as several ticketing services did in October 2015, when tickets became available for the new Star Wars movie [21].

Figure 1. Ticket sales service. To meet a performance target in displayEvent, developer switches to a weak read for getTicketCount, not realizing that this inconsistent read will be used elsewhere to compute the ticket price.

To allow the application to scale more gracefully and handle traffic spikes, the programmer may chose to weaken the consistency of some operations. As shown in Figure 1, the ticket application displays each showing of the movie along with the number of tickets remaining. For better performance, the programmer may want to weaken the consistency of the read operation that fetches the remaining ticket count to give users an estimate, instead of the most up-to-date value. Under normal load, even with weak consistency, this count would often still be correct because propagation is typically fast compared to updates. However, eventual consistency makes no guarantees, so under heavier traffic spikes, the values could be significantly incorrect and the application has no way of knowing by how much.

While this solves the programmer's performance problem, it introduces a data consistency problem. Suppose that, like Uber's surge pricing, the ticket sales application wants to raise the price of the last 100 tickets for each showing to $15. If the application uses a strongly consistent read to fetch the remaining ticket count, then it can use that value to compute the price of the ticket on the last screen in Figure 1. However, if the programmer reuses getTicketCount which used a weak read to calculate the price, then this count could be arbitrarily wrong. The application could then over- or under-charge some users depending on the consistency of the returned value. Worse, the theater expects to make $1500 for those tickets with the new pricing model, which may not happen with the new weaker read operation. Thus, programmers need to be careful in their use of values returned from storage operations with weak consistency. Simply weakening the consistency of an operation may lead to unexpected consequences for the programmer (e.g., the theater not selling as many tickets at the higher surge price as expected).

In this work, we propose a programming model that can prevent using inconsistent values where they were not intended, as well as introduce mechanisms that allow the storage system to dynamically adapt consistency within predetermined performance and correctness bounds.

3. Programming Model

We propose a programming model for distributed data that uses types to control the consistencyperformance trade-off. The Inconsistent, Performance-bound, Approximate (IPA) type system helps developers trade consistency for performance in a disciplined manner. This section presents the IPA programming model, including the available consistency policies and the semantics of operations performed under the policies. §4 will explain how the type system's guarantees are enforced.

3.1. Overview

ADT / Method Consistency(Strong) Consistency(Weak) LatencyBound(_) ErrorTolerance(_)
Counter.read() Consistent[Int] Inconsistent[Int] Rushed[Int] Interval[Int]
Set.size() Consistent[Int] Inconsistent[Int] Rushed[Int] Interval[Int]
Set.contains(x) Consistent[Bool] Inconsistent[Bool] Rushed[Bool] N/A
List[T].range(x,y) Consistent[List[T]] Inconsistent[List[T]] Rushed[List[T]] N/A
UUIDPool.take() Consistent[UUID] Inconsistent[UUID] Rushed[UUID] N/A
UUIDPool.remain() Consistent[Int] Inconsistent[Int] Rushed[Int] Interval[Int]
Table 1. Example ADT operations; consistency policies determine the consistency type of the result.

The IPA programming model consists of three parts:

Programmmers annotate ADTs with consistency policies to choose their desired level of consistency. The consistency policy on the ADT operation determines the consistency type of the result. Table 1 shows some examples; the next few sections will introduce each of the policies and types in detail. Together, these three components provide two key benefits for developers. First, the IPA type system enforces consistency safety, tracking the consistency level of each result and preventing inconsistent values from flowing into consistent values. Second, the programming interface enables performancecorrectness trade-offs, because consistency policies on ADTs allow the runtime to select a consistency level for each individual operation that maximizes performance in a constantly changing environment. Together, these systems allow applications to adapt to changing conditions with the assurance that the programmer has expressed how it should handle varying consistency.

3.2. Abstract Data Types

The base of the IPA type system is a set of abstract data types (ADTs) for distributed data structures. ADTs present a clear abstract model through a set of operations that query and update state, allowing users and systems alike to reason about their logical, algebraic properties rather than the low-level operations used to implement them. Though the simplest key-value stores only support primitive types like strings for values, many popular datastores have built-in support for more complex data structures such as sets, lists, and maps. However, the interface to these datatypes differs: from explicit sets of operations for each type in Redis, Riak, and Hyperdex [11, 25, 31, 54] to the pseudo-relational model of Cassandra [32]. IPA's extensible library of ADTs allows it to decouple the semantics of the type system from any particular datastore, though our reference implementation is on top of Cassandra, similar to [57].

Besides abstracting over storage systems, ADTs are an ideal place from which to reason about consistency and system-level optimizations. The consistency of a read depends on the write that produced the value. Annotating ADTs with consistency policies ensures the necessary guarantees for all operations are enforced, which we will expand on in the next section.

Custom ADTs can express application-level correctness constraints. IPA's Counter ADT allows reading the current value as well as increment and decrement operations. In our ticket sales example, we must ensure that the ticket count does not go below zero. Rather than forcing all operations on the datatype to be linearizable, this application-level invariant can be expressed with a more specialized ADT, such as a BoundedCounter, giving the implementation more latitude for enforcing it. IPA's library is extensible, allowing custom ADTs to build on common features; see §5.

3.3. Consistency Policies

Previous systems [4, 11, 34, 58, 61] require annotating each read and write operation with a desired consistency level. This per-operation approach complicates reasoning about the safety of code using weak consistency, and hinders global optimizations that can be applied if the system knows the consistency level required for future operations. The IPA programming model provides a set of consistency policies that can be placed on ADT instances to specify consistency properties for the lifetime of the object. Consistency policies come in two flavors: static and dynamic.

Static policies are fixed, such as Consistency(Strong) which states that operations must have strongly consistent behavior. Static annotations provide the same direct control as previous approaches but simplify reasoning about correctness by applying them globally on the ADT.

Dynamic policies specify a consistency level in terms of application requirements, allowing the system to decide at runtime how to meet the requirement for each executed operation. IPA offers two dynamic consistency policies:

Dynamic policies allow the runtime to extract more performance from an application by relaxing the consistency of individual operations, safe in the knowledge that the IPA type system will enforce safety by requiring the developer to consider the effects of weak operations.

Static and dynamic policies can apply to an entire ADT instance or on individual methods. For example, one could declare List[Int] with LatencyBound(50 ms), in which case all read operations on the list are subject to the bound. Alternatively, one may wish to declare a Set with relaxed consistency for its size but strong consistency for its contains predicate. The runtime is responsible for managing the interaction between these policies. In the case of a conflict between two bounds, the system can be conservative and choose stronger policies than specified without affecting correctness.

In the ticket sales application, the Counter for each event's tickets could have a relaxed accuracy policy, ErrorTolerance(5%), allowing the system to quickly read the count of tickets remaining. An accuracy policy is appropriate here because it expresses a domain requirementusers want to see accurate ticket counts. As long as the system meets this requirement, it is free to relax consistency and maximize performance without violating correctness. The List ADT used for events has a latency policy that also expresses a domain requirementthat pages on the website load in reasonable time.

3.4. Consistency Types

The key to consistency safety in IPA is the consistency typesenforcing type safety directly enforces consistency safety. Read operations of ADTs annotated with consistency policies return instances of a consistency type. These consistency types track the consistency of the results and enforce a fundamental non-interference property: results from weakly consistent operations cannot flow into computations with stronger consistency without explicit endorsement. This could be enforced dynamically, as in dynamic information flow control systems, but the static guarantees of a type system allow errors to be caught at compile time.

Figure 2. IPA Type Lattice parameterized by a type T.

The consistency types encapsulate information about the consistency achieved when reading a value. Formally, the consistency types form a lattice parameterized by a primitive type T, shown in Figure 2. Strong read operations return values of type Consistent[T] (the top element), and so (by implicit cast) behave as any other instance of type T. Intuitively, this equivalence is because the results of strong reads are known to be consistent, which corresponds to the control flow in conventional (non-distributed) applications. Weaker read operations return values of some type lower in the lattice (weak consistency types), reflecting their possible inconsistency. The bottom element Inconsistent[T] specifies an object with the weakest possible (or unknown) consistency. The other consistency types follow a subtyping relation $\prec$ as illustrated in Figure 2.

The only possible operation on Inconsistent[T] is to endorse it. Endorsement is an upcast, invoked by Consistent(x), to the top element Consistent[T] from other types in the lattice:

\[\inferrule{\Gamma \vdash e_1 : \tau[T] \\ T \prec \tau[T]}{\Gamma \vdash \operatorname{Consistent}(e_1) : T} \vspace{-6pt} \]

The core type system statically enforces safety by preventing weaker values from flowing into stronger computations. Forcing developers to explicitly endorse inconsistent values prevents them from accidentally using inconsistent data where they did not determine it was acceptable, essentially inverting the behavior of current systems where inconsistent data is always treated as if it was safe to use anywhere. However, endorsing values blindly in this way is not the intended use case; the key productivity benefit of the IPA type system comes from the other consistency types which correspond to the dynamic consistency policies in §3.3 which allow developers to handle dynamic variations in consistency, which we describe next.

3.4.1. Rushed types

The weak consistency type Rushed[T] is the result of read operations performed on an ADT with consistency policy LatencyBound(x). Rushed[T] is a sum (or union) type, with one variant per consistency level available to the implementation of LatencyBound. Each variant is itself a consistency type (though the variants obviously cannot be Rushed[T] itself). The effect is that values returned by a latency-bound object carry with them their actual consistency level. A result of type Rushed[T] therefore requires the developer to consider the possible consistency levels of the value.

For example, a system with geo-distributed replicas may only be able to satisfy a latency bound of 50 ms with a local quorum read (that is, a quorum of replicas within a single datacenter). In this case, Rushed[T] would be the sum of three types Consistent[T], LocalQuorum[T], and Inconsistent[T]. A match statement destructures the result of a latency-bound read operation:

  set.contains() match {
    case Consistent(x) => print(x)
    case LocalQuorum(x) => print(x+", locally")
    case Inconsistent(x) => print(x+"???")

The application may want to react differently to a local quorum as opposed to a strongly or weakly consistent value. Note that because of the subtyping relation on consistency types, omitted cases can be matched by any type lower in the lattice, including the bottom element Inconsistent(x); other cases therefore need only be added if the application should respond differently to them. This subtyping behavior allows applications to be portable between systems supporting different forms of consistency (of which there are many).

3.4.2. Interval types

Tagging values with a consistency level is useful because it helps programmers tell which operation reorderings are possible (e.g. strongly consistent operations will be observed to happen in program order). However, accuracy policies provide a different way of dealing with inconsistency by expressing it in terms of value uncertainty. They require knowing the abstract behavior of operations in order to determine the change in abstract state which results from each reordered operation (e.g., reordering increments on a Counter has a known effect on the value of reads).

The weak consistency type Interval[T] is the result of operations performed on an ADT with consistency policy ErrorTolerance(x%). Interval[T] represents an interval of values within which the true (strongly consistent) result lies. The interval reflects uncertainty in the true value created by relaxed consistency, in the same style as work on approximate computing [15].

The key invariant of the Interval type is that the interval must include the result of some linearizable execution. Consider a Set with 100 elements. With linearizability, if we add a new element and then read the size (or if this ordering is otherwise implied), we must get 101 (provided no other updates are occurring). However, if size is annotated with ErrorTolerance(5%), then it could return any interval that includes 101, such as $[95,105]$ or $[100,107]$, so the client cannot tell if the recent add was included in the size. This frees the system to optimize to improve performance, such as by delaying synchronization. While any partially-ordered domain could be represented as an interval (e.g., a Set with partial knowledge of its members), in this work we consider only numeric types.

In the ticket sales example, the counter ADT's accuracy policy means that reads of the number of tickets return an Interval[Int]. If the entire interval is above zero, then users can be assured that there are sufficient tickets remaining. In fact, because the interval could represent many possible linearizable executions, in the absence of other user actions, a subsequent purchase must succeed. On the other hand, if the interval overlaps with zero, then there is a chance that tickets could already be sold out, so users could be warned. Note that ensuring that tickets are not over-sold is a separate concern requiring a different form of enforcement, which we describe in §5. The relaxed consistency of the interval type allows the system to optimize performance in the common case where there are many tickets available, and dynamically adapt to contention when the ticket count diminishes.

4. Enforcing consistency policies

The consistency policies introduced in the previous section allow programmers to describe application-level correctness properties. Static consistency policies (e.g. Strong) are enforced by the underlying storage system; the annotated ADT methods simply set the desired consistency level when issuing requests to the store. The dynamic policies each require a new runtime mechanism to enforce them: parallel operations with latency monitoring for latency bounds, and reusable reservations for error tolerance. But first, we briefly review consistency in Dynamo-style replicated systems.

To be sure of seeing a particular write, strong reads must coordinate with a majority (quorum) of replicas and compare their responses. For a write and read pair to be strongly consistent (in the CAP sense [17]), the replicas acknowledging the write ($W$) plus the replicas contacted for the read ($R$) must be greater than the total number of replicas ($W + R > N$). This can be achieved, for example, by writing to a quorum ($(N+1)/2$) and reading from a quorum (QUORUM in Cassandra), or writing to $N$ (ALL) and reading from 1 (ONE) [22]. To support the Consistency(Strong) policy, the designer of each ADT must choose consistency levels for its operations which together enforce strong consistency.

4.1. Latency bounds

The time it takes to achieve a particular level of consistency depends on current conditions and can vary over large time scales (minutes or hours) but can also vary significantly for individual operations. During normal operation, strong consistency may have acceptable performance while at peak traffic times the application would fall over. Latency bounds specified by the application allow the system to dynamically adjust to maintain comparable performance under varying conditions.

Our implementation of latency-bound types takes a generic approach: it issues read requests at different consistency levels in parallel. It composes the parallel operations and returns a result either when the strongest operation returns, or with the strongest available result at the specified time limit. If no responses are available at the time limit, it waits for the first to return.

This approach makes no assumptions about the implementation of read operations, making it easily adaptable to different storage systems. Some designs may permit more efficient implementations: for example, in a Dynamo-style storage system we could send read requests to all replicas, then compute the most consistent result from all responses received within the latency limit. However, this requires deeper access to the storage system implementation than is traditionally available.

4.1.1. Monitors

The main problem with our approach is that it wastes work by issuing parallel requests. Furthermore, if the system is responding slower due to a sudden surge in traffic, then it is essential that our efforts not cause additional burden on the system. In these cases, we should back off and only attempt weaker consistency. To do this, the system monitors current traffic and predicts the latency of different consistency levels.

Each client in the system has its own Monitor (though multi-threaded clients can share one). The monitor records the observed latencies of reads, grouped by operation and consistency level. The monitor uses an exponentially decaying reservoir to compute running percentiles weighted toward recent measurements, ensuring that its predictions continually adjust to current conditions.

Whenever a latency-bound operation is issued, it queries the monitor to determine the strongest consistency likely to be achieved within the time bound, then issues one request at that consistency level and a backup at the weakest level, or only weak if none can meet the bound. In §6.2.1 we show empirically that even simple monitors allow clients to adapt to changing conditions.

4.2. Error bounds

We implement error bounds by building on the concepts of escrow and reservations [27, 44, 48, 50]. These techniques have been used in storage systems to enforce hard limits, such as an account balance never going negative, while permitting concurrency. The idea is to set aside a pool of permissions to perform certain update operations (we'll call them reservations or tokens), essentially treating operations as a manageable resource. If we have a counter that should never go below zero, there could be a number of decrement tokens equal to the current value of the counter. When a client wishes to decrement, it must first acquire sufficient tokens before performing the update operation, whereas increments produce new tokens. The insight is that the coordination needed to ensure that there are never too many tokens can be done off the critical path: tokens can be produced lazily if there are enough around already, and most importantly for this work, they can be distributed among replicas. This means that replicas can perform some update operations safely without coordinating with any other replicas.

4.2.1. Reservation Server

Reservations require mediating requests to the datastore to prevent updates from exceeding the available tokens. Furthermore, each server must locally know how many tokens it has without synchronizing. We are not aware of a commercial datastore that supports custom mediation of requests and replica-local state, so we need a custom middleware layer to handle reservation requests, similar to other systems which have built stronger guarantees on top of existing datastores [8, 10, 57].

Any client requests requiring reservations are routed to one of a number of reservation servers. These servers then forward operations when permitted along to the underlying datastore. All persistent data is kept in the backing store; these reservation servers keep only transient state tracking available reservations. The number of reservation servers can theoretically be decoupled from the number of datastore replicas; our implementation simply colocates a reservation server with each datastore server and uses the datastore's node discovery mechanisms to route requests to reservation servers on the same host.

4.2.2. Enforcing error bounds

Reservations have been used previously to enforce hard global invariants in the form of upper or lower bounds on values [10], integrity constraints [9], or logical assertions [37]. However, enforcing error tolerance bounds presents a new design challenge because the bounds are constantly shifting. Consider a Counter with a 10% error bound, shown in Figure 3. If the current value is 100, then 10 increments can be done before anyone must be told about it. However, we have 3 reservation servers, so these 10 reservations are distributed among them, allowing each to do some increments without synchronizing. If only 10 outstanding increments are allowed, reads are guaranteed to maintain the 10% error bound.

Figure 3. Reservations enforcing error bounds.

In order to perform more increments after a server has exhausted its reservations, it must synchronize with the others, sharing its latest increments and receiving any changes of theirs. This is accomplished by doing a strong write (ALL) to the datastore followed by a read. Once that synchronization has completed, those 3 tokens become available again because the reservation servers all temporarily agree on the value (in this case, at least 102).

Read operations for these types go through reservation servers as well: the server does a weak read from any replica, then determines the interval based on how many reservations there are. For the read in Figure 3, there are 10 reservations total, but Server B knows that it has not used its local reservations, so it knows that there cannot be more than 6 and can return the interval $[100,106]$.

4.2.3. Narrowing bounds

Error-tolerance policies give an upper bound on the amount of error; ideally, the interval returned will be more precise than the maximum error when conditions are favorable. The error bound determines the maximum number of reservations that can be allocated per instance. To allow a variable number of tokens, each ADT instance keeps a count of tokens allocated by each server, and when servers receive write requests, they increment their count to allocate tokens to use. Allocating must be done with strong consistency to ensure all servers agree, which can be expensive, so we use long leases (on the order of seconds) to allow servers to cache their allocations. When a lease is about to expire, it preemptively refreshes its lease in the background so that writes do not block.

For each type of update operation there may be a different pool of reservations. Similarly, there could be different error bounds on different read operations. It is up to the designer of the ADT to ensure that all error bounds are enforced with appropriate reservations. Consider a Set with an error tolerance on its size operation. This requires separate pools for add and remove to prevent the overall size from deviating by more than the bound in either direction, so the interval is $[v-\texttt{remove.delta},v+\texttt{add.delta}]$ where $v$ is the size of the set and delta computes the number of outstanding operations from the pool. In some situations, operations may produce and consume tokens in the same pool – e.g., increment producing tokens for decrement – but this is only allowable if updates propagate in a consistent order among replicas, which may not be the case in some eventually consistent systems.

5. Implementation

IPA is implemented mostly as a client-side library to an off-the-shelf distributed storage system, though reservations are handled by a custom middleware layer which mediates accesses to any data with error tolerance policies. Our implementation is built on top of Cassandra, but IPA could work with any replicated storage system that supports fine-grained consistency control, which many commercial and research datastores do, including Riak [11].

IPA's client-side programming interface is written in Scala, using the asynchronous futures-based Phantom [45] library for type-safe access to Cassandra data. Reservation server middleware is also built in Scala using Twitter's Finagle framework [63]. Communication is done between clients and Cassandra via prepared statements, and between clients and reservation servers via Thrift remote-procedure-calls [6]. Due to its type safety features, abstraction capability, and compatibility with Java, Scala has become popular for web service development, including widely-used frameworks such as Akka [35] and Spark [5], and at established companies such as Twitter and LinkedIn [2, 18, 29].

The IPA type system, responsible for consistency safety, is also simply part of our client library, leveraging Scala's sophisticated type system. The IPA type lattice is implemented as a subclass hierarchy of parametric classes, using Scala's support for higher-kinded types to allow them to be destructured in match statements, and implicit conversions to allow Consistent[T] to be treated as type T. We use traits to implement ADT annotations; e.g. when the LatencyBound trait is mixed into an ADT, it wraps each of the methods, redefining them to have the new semantics and return the correct IPA type.

Figure 4. Some of the reusable components provided by IPA and an example implemention of a Counter with error bounds.

IPA comes with a library of reference ADT implementations used in our experiments, but it is intended to be extended with custom ADTs to fit more specific use cases. Our implementation provides a number of primitives for building ADTs, some of which are shown in Figure 4. To support latency bounds, there is a generic LatencyBound trait that provides facilities for executing a specified read operation at multiple consistency levels within a time limit. For implementing error bounds, IPA provides a generic reservation pool which ADTs can use. Figure 4 shows how a Counter with error tolerance bounds is implemented using these pools. The library of reference ADTs includes:

Figure 4 shows Scala code using reservation pools to implement a Counter with error bounds. The actual implementation splits this functionality between the client and the reservation server.

6. Evaluation

The goal of the IPA programming model and runtime system is to build applications that adapt to changing conditions, performing nearly as well as weak consistency but with stronger consistency and safety guarantees. To that end, we evaluate our prototype implementation under a variety of network conditions using both a real-world testbed (Google Compute Engine [28]) and simulated network conditions. We start with simple microbenchmarks to understand the performance of each of the runtime mechanisms independently. We then study two applications in more depth, exploring qualitatively how the programming model helps avoid potential programming mistakes in each and then evaluating their performance against strong and weakly consistent implementations.

6.1. Simulating adverse conditions

To control for variability, we perform our experiments with a number of simulated conditions, and then validate our findings against experiments run on globally distributed machines in Google Compute Engine. We use a local test cluster with nodes linked by standard ethernet and Linux's Network Emulation facility [62] (tc netem) to introduce packet delay and loss at the operating system level. We use Docker containers [24] to enable fine-grained control of the network conditions between processes on the same physical node.

Table 2 shows the set of conditions we use in our experiments to explore the behavior of the system. The uniform 5ms link simulates a well-provisioned datacenter; slow replica models contention or hardware problems that cause one replica to be slower than others, and geo-distributed replicates the latencies between virtual machines in the U.S., Europe, and Asia on Amazon EC2 [3]. These simulated conditions are validated by experiments on Google Compute Engine with virtual machines in four datacenters: the client in us-east, and the storage replicas in us-central, europe-west, and asia-east. We elide the results for Local (same rack in our testbed) except in Figure 11 because the differences between policies are negligible, so strong consistency should be the default there.

Network Condition Latencies (ms)
Simulated Replica 1 Replica 2 Replica 3
Uniform / High load 5 5 5
Slow replica 10 10 100
Geo-distributed (EC2) 1 ± 0.3 80 ± 10 200 ± 50
Actual Replica 1 Replica 2 Replica 3
Local (same rack) <1 <1 <1
Google Compute Engine 30 ± <1 100 ± <1 160 ± <1
Table 2. Network conditions for experiments: latency from client to each replicas, with standard deviation if high.

6.2. Microbenchmark: Counter

We start by measuring the performance of a simple application that randomly increments and reads from a number of counters with different IPA policies. Random operations (incr(1) and read) are uniformly distributed over 100 counters from a single multithreaded client (allowing up to 4000 concurrent operations).

6.2.1. Latency bounds

Figure 5. Counter: latency bounds, mean latency. Beneath each bar is the % of strong reads. Strong consistency is never possible for the 10ms bound, but 50ms bound achieves mostly strong, only resorting to weak when network latency is high.