Mitigating Contention in Distributed Systems
General Exam
Brandon Holt
University of Washington


The online world is a dangerous place for interactive applications. A single post by Justin Bieber can slow Instagram to a crawl; a black and blue dress going viral sends Buzzfeed scrambling to stay afloat; Star Wars movie ticket pre-sales cause service interruptions for Fandango and crash others. Developers of distributed applications must work hard to handle these high-contention situations because though they are not the average case, they affect a large fraction of users.

Techniques for mitigating contention abound, but many require programmers to make tradeoffs between performance and consistency. In order to meet performance requirements such as high availability or latency targets, applications typically use weaker consistency models, shifting the burden of reasoning about replication to programmers. Developers must balance mechanisms that reign in consistency against their effect on performance and make difficult decisions about where precision is most critical.

Abstract data types (ADTs) hide implementation details behind a clean abstract representation of state and behavior. This high-level representation is easy for programmers to build applications with and provides distributed systems with knowledge of application semantics, such as commutativity, which are necessary to apply contention-mitigating optimizations. With inconsistent, probabilistic, and approximate (IPA) types, we can express weaker semantics than with traditional ADTs, enabling techniques for weak consistency to be used and allowing precision to be explicitly traded for performance where applications can tolerate error. In previous work, we have demonstrated that ADT awareness can result in significant performance improvements: 3-50x speedup in transaction throughput for high-contention workloads such as an eBay-like auction service and a Twitter-like social network. IPA types are proposed for future work.

1. Introduction

Imagine a young ticket selling app, embarking on a mission to help people everywhere get a seat to watch their favorite shows. Bright-eyed and ready to face the world, it stores everything in a key/value store so that it will not forget a thing. It uses a distributed key/value store, so when it expands into new cities and reaches its millionth customer, the datastore continues to grow with it. The app uses strong consistency and transactions to ensure that no tickets are sold twice or lost, and its customers praise its reliability. “Five stars. That little app always gets my tickets, fast!” they say.

Then one day, pre-sales for the 7th Star Wars movie come out and suddenly it is inundated under a surge over 7 times the usual load, as a record-breaking number of people try to purchase tickets for this one movie. This concentrated traffic causes hot spots in the datastore; while most nodes are handling typical traffic, the traffic spike ends up being funneled to a handful of nodes which are responsible for this movie. These hotspots overload this app's “scalable” datastore, causing latencies to spike, users' connections to time out, and disappointed users who will not get to see the movie on opening night because the app was not able to sell them a ticket. On this dark night for moviegoers, even major players like Fandango, Regal, and AMC are plagued with service interruptions; some sites even crash or lose data.

This kind of concentrated surge in traffic is a perfect illustration of the contention that occurs all the time in real-world workloads, from ticket sales to social networks. Power law distributions are everywhere, from popularity of pages to number of followers, leading to network effects which can magnify even the slightest signals. Events – such as the FIFA World Cup or the Oscars – happening in realtime drive spikes in traffic. All together, this can create memes that propagate virally through social media and news sites. Interactivity is crucial to all of this — it both fuels the propagation of memes and causes contention.

Paradoxically, though contention is not the average case, it is responsible for many of the most challenging problems: tail latency, failures, and inconsistency bugs. Most of the time, an application may work correctly, with low latency and no noticeable inconsistency. However, in that 99th percentile case, contention results in a hot spot, where latencies spike, failure rates go up, and consistency can degrade, exposing bugs or resulting in user-visible inconsistencies.

1.1. Mitigating contention

Many techniques, over many years, have tackled this problem from different angles, from research on escrow and fragmenting aggregate fields in the 80s [51], to modern research on auto-generated coordination based on annotations [15, 63]. Some require a variety of changes to the programming model, while others improve underlying protocols and mechanisms. All are focused on exposing concurrency and making tasks simpler for programmers, but they can be broken down into three broad approaches, shown in Figure 1.

Figure 1. Overview of approaches for mitigating contention, such as the hotspot in red.
  1. Is there any concurrency within the contended record that can be exposed? If operations on the record are commutative, they can safely run concurrently without changing the semantics; abstract locks (§6.2.1) leverage this to allow transactions to overlap and avoid false conflicts. Escrow (§4.1.3) allows the record to be treated as if it was split into a pool of fragments.

  2. If clients are multithreaded, as is the case with frontend web servers that typically handle many end-user requests concurrently, then some of the synchronization work can be offloaded to them. Combining (§6.1) leverages associativity to allow operations to be merged locally and executed as a single operation remotely, reducing the amount of work done on the overloaded datastore. Other techniques like leases (§4.2.1) let client-side caches avoid costly invalidation messages.

  3. If clients are allowed to interact with weakly-synchronized replicas, the load on the contended shard can be reduced. However, this comes at a significant programmability cost: the illusion of a single copy of data is broken and programmers must now reason about replicated state. Weakly synchronized replicas share updates asynchronously, and clients may communicate with multiple replicas, so they can observe effects out of order, or perform updates which conflict and result in inconsistent states.

To be clear, techniques in the first and second categories can use replicas for fault tolerance and still maintain strong consistency. Strong consistency requires writes to be linearizable [37], which can be accomplished with replicas either by funneling through a single master or by a consensus protocol like Paxos [43] that makes replicas appear like a single machine. Techniques like E-Paxos [48] would fall under (1) because they expose concurrency while maintaining the single-machine view.

The third category requires much more drastic changes to programming models than the first two because it forces programmers to sacrifice consistency guarantees. However, weak consistency unlocks further improvements to performance properties like availability and lower latency that are impossible with strong consistency. In this work I will focus mostly on techniques that fall into this category because these are the most challenging to understand and require the most significant changes for programmers.

1.2. Balancing requirements

Mitigating contention is just one of the many competing requirements placed on distributed applications. They are expected to be scalable, fault tolerant, and highly available. Users demand responsive applications, no matter how many other users are active or where they are. The common wisdom is that companies lose money for every increase in response latency. Many systems have service-level agreements (SLAs) promising responses within a certain latency for all but the 99th percentile of requests. Throughput during peak times must be able to keep up with the load.

These performance constraints compete with the desire for programmability and correctness. Fundamentally, strong consistency cannot be provided with high availability — replication must be exposed in some way. Strict serializability and ACID transactions are among the many useful programming abstractions that must be broken to achieve those performance properties. Applications are still expected to appear mostly consistent, so developers are forced to think carefully about how to build correct systems with weaker guarantees and choose where to focus their effort.

Luckily, not everything requires the same level of precision or consistency. Some actions, such as selling the last ticket to Star Wars' opening night, require precise, consistent execution. Other situations, such as viewing the number of retweets for a popular tweet, do not need to be exactly correct — users may be satisfied as long as the number is the right order of magnitude. These hard and soft constraints can be difficult or impossible to express in current systems. Furthermore, whatever tradeoffs are made to improve programmability or enforce constraints must be balanced against meeting the performance targets, though it is not easy to quantify how changes will affect them. Programmers of these distributed applications must constantly juggle competing correctness and performance constraints.

1.3. Overview

To understand how the various techniques for managing consistency and performance relate to one another, we will explore them in terms of the properties they trade off:

We will also discuss how each technique operates, in terms of:

After exploring the space of existing techniques, we will propose a programming model that incorporates these disparate solutions into a single abstraction – using abstract data types (ADTs) to concisely describe application semantics and hide the details of the underlying consistency and coordination techniques. Our implementations, in a distributed data analytics system, Grappa, and a prototype ADT-store, Claret, show significant performance improvements, especially for high contention workloads. In future work, we propose using inconsistent, probabilistic, and approximate (IPA) types to trade off precision in order to take advantage of weak consistency and replication for high availability and scalability.

2. Trading off consistency for performance

In order to meet scalability, availability, and latency requirements, distributed systems programmers must routinely make tradeoffs between consistency and performance [11, 21, 32]. In the typical case, consistency does not pose a problem, even for geo-replicated systems – most requests are reads, requests for the same user typically go to the same server, and inconsistency windows after updates are usually small [11, 47]. Consistency becomes problematic in exceptional high contention cases and the long tail of disproportionately slow requests [28]. However, these cases are almost pathologically bad – many users together create conflicts leading to inconsistency, but this also means that many users are there to observe the inconsistency, so it is more likely to be noticed. The events most likely to cause problems are also the most news-worthy.

Targeting performance for the average case can lead to catastrophic failures in the contentious cases, but the handling all possible cases damages programmability. This burden will become abundantly clear over the next few sections as we look at the axes which programmers must traverse when making implementation choices.

3. Ordering and visibility constraints

Understanding the behavior of a sequence of operations in a program requires knowing all of the ordering and visibility constraints that govern what each operation returns and when each update actually runs. How these constraints are set is determined by the programming model.

3.1. Consistency models

Analogous to memory models in the field of computer architecture, consistency models refer to the allowable reorderings of operations and their visibility in a distributed system. Consistency models differ from memory models primarily in one way: exposing the existence of replication. Due to the reliability and speed of CPU cache hierarchies, memory models can afford to assume coherence will ensure that there appears to be only one copy of memory. In distributed systems, where failure is a possibility and synchronization is expensive, it is often necessary to expose replication through the consistency model. This makes them significantly more difficult to reason about – as if memory models were not complex enough as it is.

The strongest consistency model, strict serializability (roughly defined as Lamport's sequential consistency [42] combined with Herlihy's linearizability [37]) guarantees that operations appear to occur in a global serial order that all observers agree on and that corresponds to real time. This, and any form of consistency that requires enforcing a global total order is theoretically impossible to enforce with high availability due to the possibility of network partitions (this is the essence of the CAP theorem [21, 32]). In practice, strict serializability may not be wholly impractical for the average case, but ensuring it in all cases is prohibitively expensive.

At the other extreme, eventual consistency, the least common denominator among consistency models, simply guarantees that if update operations stop occurring, all replicas will eventually reflect the same state [70]. Under this model, programmers cannot count on subsequent operations reflecting the same state, because those operations could go to any replica at any time, and those replicas are continuously receiving updates from other nodes.

There are a whole family of models similar to eventual consistency which add various ordering constraints:

There are too many variations on these and other models to enumerate, including combinations of them. Each restricts ordering and visibility differently, making some cases easier for programmers to reason about, while reducing the flexibility and therefore performance of the system. For instance, some require sticky sessions [68], which forces clients to continue communicating with a particular replica, even if it is not the fastest, or lowest latency, or most up-to-date one available.

As a way of trading off consistency for performance, weak consistency models are a poor choice. A particular consistency model must often be chosen at a very coarse grain, possibly at the level of an entire database. Stronger guarantees can be enforced on top of a weaker model using quorums, but these must be chosen carefully, and the code to handle this is not easily adapted to changes in the underlying consistency model. In general, weak consistency models are not modular: adding or changing an operation in one place may require changing assumptions about ordering elsewhere. Understanding when an operation becomes visible to others is yet another source of confusion.

3.2. Transactions

Transactions are a well-established way to provide stronger guarantees among some operations. By choosing the type and granularity of transactions, programmers have some control over the ordering and visibility of their operations. Like strict serializability, full ACID transactions require a global order so are prohibitive to scaling and high availability, leading to the development of weaker transaction models. As the antithesis of the strong guarantees of ACID, some have termed these weaker semantics “BASE” (Basically Available, Soft state, Eventually Consistent) [53]. Luckily, programmers are not restricted to simply choosing between these two extremes; some have proposed ways to bridge the gap.

3.2.1. Transaction chopping and chaining

Some techniques can expose limited extra concurrency between transactions without requiring programmers to sacrifice ACID semantics. Transaction chopping [60] automates a task programmers could do by hand: breaking transactions into minimal-sized atomic pieces, determined by a static analysis of interleavings. A more recent system, Lynx [76] executes split transactions as a chain, hopping from shard to shard, coordinating the order of execution to allow conflicting transactions to safely interleave. Finally, Callas [74] groups transactions that commute with each other to allow them to execute concurrently. Transaction boosting [35] similarly allows transactions to overlap when they commute, but at the level of individual operations.

3.2.2. Salt: Combining ACID and BASE

Just as a small fraction of data items are responsible for the majority of contention, the same is true for transactions. Rather than forcing programmers to give up ACID semantics for their entire application, Salt [73] allows transactions with BASE semantics to coexist safely with ACID transactions. Using new locking schemes, they ensure that transactions executing with weaker BASE semantics cannot violate the strong safety guarantees of the ACID transactions.

Converting an ACID transaction to execute without those guarantees is an error-prone task; it involves considering all the new possible interleavings and establishing how to resolve all the possible conflicts without coordination. Salt's model means that programmers only need to “BASE-ify” the transactions causing performance or scaling problems. This makes it relatively straightforward to trade off consistency where necessary, but does not do much to help programmers deal with the weaker semantics.

3.2.3. RAMP transactions

Even without guarantees of a serializable total order, there are still benefits to supporting atomic updates: preventing foreign key constraint violations, and ensuring that indexes and other derived data are as up-to-date as the backing data. In support of these safety properties, RAMP (Read Atomic Multi-Partition) transactions provide coordination-free atomic visibility for multiple updates [12]. They work by staging updates on all participating shards so they can force the complete set of updates for a transaction to be made visible at one point in time. They dynamically detect racy reads and fix them by either selecting an appropriate staged version or waiting for another round of communication. Because these determinations are made locally, they cannot guarantee global mutual exclusion or serializability.

3.3. Models inspired by distributed version control

One of the troubles with weakly consistent replication is that it is often not possible to construct a serializable history of an execution, making it difficult to figure out which effects could have been visible to a particular client. Distributed version control systems (DVCS) like Git have inspired alternative ways of viewing concurrent execution. DVCSs allow individuals to work concurrently on forks or branches without interference and resolve conflicts at explicit merge points later. These histories are not serializable, but they still allow users to easily understand when effects become visible to different observers.

The Push/Pull Model of Transactions [40] formalized several consistency and transaction models in these terms. Another model called branch consistency proposed for a system called TaRDIS [27] uses the notion of branching and merging for isolation and conflict resolution in geo-replicated systems, delegating conflict resolution to applications.

Concurrent revisions [22] proposes an execution model built around forking and joining state along with concurrent execution to make sharing explicit. In this model, concurrent tasks fork a copy of the state they access. Changes to forked state are only visible to that task and its descendants until the concurrent task is joined back in. On join, changes to forked data items are merged according to their type. For example, as a cumulative type, a forked Counter tracks increments made to it, and when joined, adds those increments to the original value. In this way, multiple concurrent tasks can increment the shared Counter without conflicting and it is clear exactly when their effects are made visible. Follow-on work extended the ideas of concurrent revisions and revision diagrams to reason about eventual consistency: in eventually consistent transactions [23] and mobile/cloud applications [24].

These programming models show that there is hope for reasoning about weakly consistent replicated data. The isolation types and cloud types from concurrent revisions provide useful semantics for working with highly contended data. In these models, trading off consistency for performance is not as clear-cut; revision diagrams and DVCS histories imply strict coordination points. It is also unclear how to enforce global constraints on forked data. Consider again the ticket sales example: in order to ensure tickets are not over-sold, concurrent forks must somehow know how many of the remaining tickets they are allowed to sell, without knowing how many other forks exist, breaking the abstraction.

3.4. Annotating constraints


Figure 2. Annotating application invariants. Annotations are used to determine where coordination is necessary and what consistency is required to enforce it.

Several datastores allow consistency levels to be specified on a per-operation basis: research systems Gemini [44] (RedBlue consistency), and Walter [64], and production systems Cassandra [8] and Riak [18] (per object or namespace). However, they leave programmers to determine where to use stronger consistency in order to achieve their correctness goals, a very error-prone task. Recent work has explored ways of automatically choosing the correct consistency level or coordination strategy based on annotations.

Sieve [45] builds on top of Gemini, automatically determining how to implement the desired semantics with causal consistency and adding additional synchronization wherever strong consistency is needed. It relies on programmer-specified global invariants and annotations on the relational database schema to select the desired merge semantics in case of conflicts. These annotations echo the variants of CRDTs (see §4.1.1).

Quelea [63] has programmers write contracts to describe ordering constraints between operations and then automatically selects the correct consistency level for each operation to satisfy all of the contracts. Contracts are specified in terms of low-level consistency primitives such as visibility and session order. For example, to ensure a non-negative bank account balance, a contract indicates that all withdraw operations must be visible to one another, forcing the operation to be executed with sequential consistency. Because correctness properties are specified independent of a particular consistency model, or set of consistency levels, they are composable with each other and portable to other datastores supporting different consistency options. However, the low-level primitives used in contracts may not be intuitive for programmers and still require reasoning about all the possible anomalies between operations. These primitives are unable to capture other forms of coordination, sometimes leading to more conservative ordering constraints than necessary.

Indigo [15] takes a different approach to expressing application requirements: instead of specifying visibility and ordering constraints, programmers write invariants over abstract state and state transitions, and annotate post-conditions on actions to express their side-effects in terms of the abstract state. They then perform a static analysis to determine where concurrent execution could violate the invariants and add coordination logic to avoid those conflicts. Supported constraints include numeric constraints, such as lower or upper bounds on counts, as well as integrity constraints and general compositions of these.

Figure 2 shows how annotations would be written in these three systems to implement our running ticket sales example. In this case, the desired invariant is that tickets are not over-sold – that is, the count of remaining tickets should be non-negative. Sieve and Indigo can enforce this invariant directly as written. Quelea's visibility-based contracts cannot tightly describe this invariant; instead, they must be conservative and force ticket sales to be strongly consistent.

Indigo's approach provides an excellent way to express application-level semantics and have the system automatically figure out how to enforce them. The primary downside is that abstract state must be modeled separately from the true application state. Additionally, though their invariants can specify hard constraints, they do not have a way to express the soft constraints we discussed in §1.2.

3.5. Review: Constraints


Figure 3. Ordering and visibility constraints.

Figure 3 organizes the systems we have so far discussed along axes of increased ordering and visibility constraints. There is no best point in this space – increased ordering constraints come with restrictions which limit availability and increase latency. The best solutions, therefore, provide the most flexibility and control over the constraints they can enforce.

Weak consistency models provide little in the way of ordering or visibility guarantees. With eventual consistency at the very bottom, Read-Your-Writes additionally constrains operations within a session to be visible to one another, while Monotonic Writes force some order among writes, and Monotonic Reads the visibility of those ordered writes. Of the consistency models, Causal provides the most programmer control – essentially arbitrary constraints can be created by forcing the system to consider them causally linked.

RAMP transactions and DVCS-like techniques allow explicit control over visibility but little control over ordering between transactions (or branches/forks), placing them at the top left of Figure 3. Because Salt allows weaker transactions to interoperate with ACID transactions, it allows programmers to select from range of ordering and visibility levels depending on need. Cassandra, Riak, Gemini, and Walter allow per-operation consistency levels to be set, which also provides a range of constraints, but with significant complexity for programmers.

The annotation-based techniques provide the most control over these constraints and also automate part of the task of choosing them. Quelea, with contracts specifically on these constraints, provides the most control, but at a lower level of abstraction than Indigo or Sieve's application-level invariants. So far, we have been dealing mostly in terms of plain reads and writes; the next section will show how to do better by placing bounds on the allowable values and staleness.

4. Uncertainty

Factoring out the knowledge provided by ordering and visibility constraints, the state observed by operations is uncertain – subject to constraints on what values the piece of data may hold and how out-of-date the replica is.

4.1. Restricted values

One of the problems with using eventual consistency is ending up with conflicting writes that overwrite one another in unpredictable ways. The solution to this without enforcing mutual exclusion somehow is to define commutative deterministic merge functions that resolve conflicts resulting from concurrent updates. The vision of Bayou [67] was that applications would define custom merge functions over all of the application state so that users could work offline and automatically synchronize their changes the next time they connected. Though this has not caught on in any major way due to the difficulty (and non-modularity) of handling all possible combinations of updates in a way that is satisfactory to users, it has led to the development of libraries of data structures with this property called CRDTs.

4.1.1. CRDTs

Convergent (or conflict-free) replicated data types (CRDTs) [59] are data types that have commutative merge functions defined for them. Resolving conflicts deterministically requires making choices about the semantics of concurrent updates, leading to a proliferation of CRDTs for various use cases. Even simple data structures like Sets must have multiple variants such as those in Table 1 that resolve non-commuting operations differently.

G-Set “Grow-only” set, remove is simply disallowed.
PN-Set A counter per item matches adds and removes, the set contains the item whenever there are more adds.
OR-Set “Observed-remove” set where causally-related removes are observed, but add wins over remove when concurrent.
Table 1. Example Set CRDTs. Variations are due to the fact that add and remove do not commute for a sequential Set.

CRDTs can be enormously useful because they allow concurrent updates to accumulate. Riak [18] implements several data types and encourages their use. Like ADTs, CRDTs also provide a well-defined set of possible values that a variable can hold, restricted to changes made by supported operations. This is a clear advantage over simple registers that are completely overwritten on each write, making them unpredictable when the order of updates is uncertain. CRDTs can still suffer from many of the effects of eventual consistency. Updates applied to different replicas mean clients could see divergent changes for some time, and convergence does nothing to change that. Keep in mind that divergence could continue indefinitely thanks to eventual consistency, we will get to techniques that quantify the actual time this takes.

4.1.2. Bloom

The philosophy of Bloom [5, 26] is to find ways to write programs that avoid the need for coordination as much as possible. The CALM Principle (Consistency And Logical Monotonicity) advocated by this work formalizes the requirements for an entire program to be eventually consistent, obviating any need for coordination. It is built around the notion of monotonicity—programs compute sets of facts that grow over time so that information is never lost and convergence can be guaranteed.

In the Bloom model, programmers express applications as statements about monotonically growing sets of facts. These facts can be encoded as sets or other collections with suitable merge functions ensuring values of the type have a well-defined partial order, such as CRDTs [26]. Bloom statically ensures that programs compose these types in ways that are monotonic.

In its pure form, Bloom's programming model requires substantial changes to most applications, so later work on Blazes [6] showed how the monotonicity analysis could be applied to find where coordination is necessary in existing distributed applications by annotating components with Bloom's properties. This style of programming is somewhat different than the other annotation-based approaches of Indigo and Quelea because it focuses on sealing streams at coordination points. Bloom and its variants can ensure eventual consistency, but again this says nothing about how long it will take or what intermediate states will be observable, so in practice, users would still have to worry about observing stale or inconsistent states.

4.1.3. Escrow and Reservations

Escrow is a term from banking and legal proceedings where some amount of money is set aside and held by a third party in order to ensure it will be available for use at a later time after some (typically legal) condition is met. In database systems, this term has been borrowed for use in concurrency control to refer to “setting aside” some part of a record to be later committed.

O'Neil's idea of escrow [51] came from work on Fast Path [31] and Reuter's Transactional Method [54]. The idea was to increase concurrency on aggregate fields, such as fields keeping track of a count or a sum, which could become hot spots because they were updated frequently. The idea of escrow is to split up an aggregate value into a pool of partial values and allocate parts from the pool to transactions when they execute so that when they are ready to commit, they are guaranteed to be able to. For example, if a transaction is going to decrement an account balance provided there are sufficient funds, it will hold the amount it wishes to decrement in escrow. Other transactions can also decrement the balance, as long as combined they will not leave the balance negative. If a transaction aborts, the escrowed values are returned to the pool.

Escrow can be extended to any fragmentable object [71], that is, any data type providing a way to split itself into fragments of the same type, and a way to later merge the fragments back together. This is essentially the inverse of the criteria for monoids used for aggregators in Summingbird [20], and similar to the isolation types in Concurrent Revisions §3.3.

Reservations can be thought of as escrow for replicated data. A reservation pre-allocates permission to do updates so that in the future they can be done without coordination. This moves coordination off the critical path and allows synchronization to be amortized when multiple updates share a reservation. For example, in Mobisnap [52] where they were first introduced, a salesman might reserve a number of tickets or a some quantity of a commodity, then while on the go, without connectivity, make a sale and know that it is safe to do so. Combining reservations with leases (discussed next in §4.2.1) allows them to be reclaimed automatically after a certain time has elapsed. Exo-leasing [62], not quite using lease the same way, further extends escrow and reservations to be decentralized and exchangeable among offline clients.

Reservations are easy to generalize — Indigo [15] uses them to implement its application-specific conflict avoidance logic that includes auto-generated code. A similar implementation of numeric invariant preservation called bounded counters [16] was built on top of Riak.

Imbalance can be a problem when reservations are distributed: if some replicas receive more requests than others, they may use up all of their reserved updates before others do. In these situations, techniques such as the demarcation protocol [14, 17], or handoff [4] can allow replicas to redistribute permissions or re-balance per-replica limits. These techniques operate similar to work-stealing in Cilk [3].

4.2. Bounded staleness

In theory, eventually consistent systems provide absolutely no guarantees during execution because there is no bound on the time it must take for updates to propagate, and there are almost no situations where updates are guaranteed not to occur. In practice, however, programmers typically observe very few actual consistency errors, even at large scale [47]. This is because the propagation time, or inconsistency window is typically very small, on the order of tens of milliseconds [10], so few accesses observe the gap. However, programmers cannot rely on these observations because they do not hold in all cases. High contention situations are particularly problematic because with more concurrent updates and accesses, the chances of observing inconsistencies is much higher, and the value is also likely to be further from the correct value.

4.2.1. Leases

The problem with reads is that by the time the client gets the result, it could already be out of date, even without the additional complexity introduced by eventual consistency. Leases are a way of communicating how old a read is, by associating it with a time in the future when it should be considered stale. First proposed for file system caches to avoid needing to send explicit invalidations [33], they are now used in application caches in modern datacenters, such as in Facebook's Memcache system [50]. In addition to simply bounding staleness without explicit invalidation, leases can be used to indicate a promise that the value will not be updated for some time.

Warranties [46] combine leases with reservations to grant permission for holders to perform specific updates for a fixed amount of time. This has the benefit that permissions are automatically reclaimed after the time has elapsed, even if the holder crashed, and saving a reply message if the recipient does not wish to use the warranty.

4.2.2. Probabilistically bounded staleness

In order to help programmers reason about staleness, Bailis et al. [10] introduced a metric called probabilistically bounded staleness (PBS) which quantifies the staleness of accesses, either in terms of time or versions. By observing the distributions of propagation delays, round trip times, and rate of updates, their implementation builds a model of the system and uses it to predict staleness during execution. They also discussed how, by choosing the number of replicas to send writes to or consult for reads, one can control staleness and suggested how PBS could be used to select the right balance.

4.2.3. Consistency-based SLAs

SLA code example

Figure 4. Example Consistency-based SLA: Shopping cart.

With consistency-based SLAs [69], programmers can explicitly trade off consistency for latency. A consistency SLA specifies a target latency and a consistency level (e.g. 100 ms with read-my-writes). In this programming model, operations specify a set of desired SLAs, each associated with a utility. Using a prediction mechanism similar to PBS, the Pileus system attempts to determine which SLA to target to maximize utility, typically to achieve the best consistency possible within a certain latency.

Allowing users to specify their desired latencies and consistencies directly to the system is powerful. However, because it is so fine-grained, the burden of choosing target latencies and consistency for each operation could be quite high, and it seems difficult to compose a sequence of operations and SLAs to achieve an overall target latency or correctness criteria.

4.3. Review: Uncertainty


Figure 5. Bounding uncertainty in terms of staleness and restricting values.

Figure 5 maps out the techniques we have covered along two axes: how much they bound staleness versus how they restrict the set of allowable values. Again, eventual consistency provides the least guarantees. In fact, most of the techniques covered earlier do not affect staleness hardly at all. Instead, stronger consistency models and the annotation-based techniques restrict possible values by eliminating conflicts. The major differentiator comes with the switch to using CRDTs to restrict operations to those supported for each data type, like with ADTs. Bloom, by restricting programs to monotonic transformations over CRDTs, has the most bounded values.

Escrow and reservations can be very useful for bounding the uncertainty of replicated data. They allow hard bounds to be enforced without preventing parallelism in most cases. Consider again the Star Wars ticket sales example from §1. In order to handle the high load, we could replicate the tickets for this movie and allow clients to purchase tickets from any replica, but then we would not be able to prevent the same ticket from being sold to two different users. Using the concept of escrow, however, we can divide the tickets among the replicas, allowing clients to purchase them in parallel while there are many remaining, yet preventing tickets from over-selling when they begin to run out. We will discuss this more in §7.

Along the axis of staleness, PBS provides additional knowledge, but little control. Leases, on the other hand, allow for a range of information about staleness to be conveyed, and warranties provide additional control over how values change by controlling permission to perform updates. Consistency SLAs allow latencies and consistency levels to be traded off, giving control over both axes of uncertainty to applications.

5. Programming model comparison