Making consistency safe again.
Distributed applications and web services, such as online stores or social networks, are expected to be scalable, available, responsive, and fault-tolerant. To meet these steep requirements in the face of high round-trip latencies, network partitions, server failures, and load spikes, applications use eventually consistent datastores that allow them to weaken the consistency of some data. However, making this transition is highly error-prone because relaxed consistency models are notoriously difficult to understand and test.
In this work, we propose a new programming model for distributed data that makes consistency properties explicit and uses a type system to enforce consistency safety. With the Inconsistent, Performance-bound, Approximate (IPA) storage system, programmers specify performance targets and correctness requirements as constraints on persistent data structures and handle uncertainty about the result of datastore reads using new consistency types. We implement a prototype of this model in Scala on top of an existing datastore, Cassandra, and use it to make performance/correctness tradeoffs in two applications: a ticket sales service and a Twitter clone. Our evaluation shows that IPA prevents consistency-based programming errors and adapts consistency automatically in response to changing network conditions, performing comparably to weak consistency and 2-10x faster than strong consistency.
Using abstract data types to expose application-level semantics in datastores.
Claret brings the clean abstractions of data structures and ADTs to the world of distributed key-value datastores. There is a new data storage model on the rise that bridges the gap between the restrictions of fully relational schemas and SQL, and the completely flat key-value stores. Data structure stores such as Redis, Riak, and Hyperdex allow programmers to build their persistent data out of whatever data structures make sense: lists, sets, maps, or more specialized data types. Programmers like it because it is simple to use and flexible, but allows them to leverage the power and reusability of the provided complex data structures.
Claret builds on this data structure storage, exposing the abstract semantics of these data types to the underlying distributed datastore to enable new optimizations. Crucially, by exposing the concurrency between commutative ADT operations that already exists in these applications, Claret can provide much stronger guarantees — serializable distributed transactions — with much less cost to throughput and latency than traditional systems. Our model allows programmers to either choose from a library of existing data types or extend the system with custom data types that specifically meet their application's needs, in order to expose the maximum amount of information to the system. Our prototype datastore implements optimizations that leverage this information, including abstract locks and transaction boosting, operation combining, and operation reordering or phasing.
Scaling irregular applications on commodity hardware.
Irregular applications are those that do lots of hard-to-predict, data-dependent, fine-grained memory accesses. Examples include graph analytics, molecular dynamics, circuit simulation, neuromorphic computation, and many more. The lack of spatial and temporal locality in these applications makes it difficult to scale them beyond a single node because commodity networks need large packets to get near their peak throughput. The goal of this project is to make it easier to develop and run those kinds of applications on large compute clusters. We see the same few tricks being implemented over and over when irregular applications are tuned for maximum performance, such as rewriting parts to buffer communication, using asynchronous callbacks to overlap communication and disk I/O with computation. In addition to being wasteful, this process can be error-prone.
The core is a runtime system we are developing that automatically aggregates small messages to improve network bandwidth, using massive multithreading to tolerate the increased latency. All we ask is that the programmer expose sufficient parallelism, a quantity that is not lacking in these "Big Data" applications. Our highly-optimized runtime can then manage moving data and computation around the cluster, performing tricks such as issuing hardware pre-fetches and carefully managing the L1 cache, performing extremely lightweight context switches, and coordinating RDMA transfers to get maximum throughput on the network.
Grappa is also now open source! We would love to help people try it out on their own problems. Check it out on Github.
A couple sub-projects related to Grappa that I've worked on are listed below.
One of the primary tricks to Grappa's success is moving computation to where data, which works especially well in low-locality situations. However, explicitly writing these delegate operations breaks the PGAS shared memory abstraction, tying code to one particular memory layout, and making writing performant distributed applications tedious. Alembic is a compiler analysis written for LLVM that automatically transforms threads to do computation migration to improve locality. Using a technique similar to continuation-passing style transformation, Alembic chooses the best places to migrate, reorders memory instructions, and splits threads into a series of messages, providing high-performance migrating threads automatically. This work will be presented at OOPSLA in October:
The general idea is that synchronization on global shared data structures can be massively improved by waiting and combining many operations together, and Grappa's massive multithreading allows us to tolerate this additional latency.
Brandon Myers and I submitted a workshop paper to HotPar '12 exploring whether it is possible to make profitable predictions about when to move a task its the data (migration) rather than moving the data. Our study involved instrumenting the shared memory accesses in a few simple benchmarks, collecting an execution trace, and simulating the cost of data movement under different migration policies, including an optimal migration schedule.
This project was born out of necessity when we wanted to be able to collect output from many experiments in Grappa and related projects. We also needed a way to easily enumerate a large multi-variate parameter space, especially when trying to find the right parameters to maximize performance for Grappa. This script aims to help with generating a large number of experiments, parsing their output, and storing experiment inputs and outcomes to a SQLite database automatically. For now, it simply supports having a static script that runs all the experiments in a single batch and gathers the results automatically. We would like to make the experience more interactive, where a prompt can be used to schedule new parameter sweeps, monitor progress of existing experiments, inspect gathered data points, and visualize preliminary results. A complete re-write to support this goal will probably happen--eventually.