1. What is the problem? Is the problem real?
The problem of providing a storage system made of commodity (and hence failure-prone) machines for high availability and performance requirements, where the workload has plenty of write requests. The problem is real and is faced by most companies that support services similar to Amazon.
2. What is the solution's main idea (nugget)?
Data is partitioned and replicated using consistent hashing (of Chord-fame). Given that writes constitute a significant portion of their workload, guaranteeing availability while maintaining consistency is a big challenge. Dynamo uses a quorum-based consistency protocol where a minimum number of nodes must be up for a successful read or write. This parameter is configurable according to application requirements of availability. Object versioning helps in guaranteeing availability, and the divergent versions are reconciled during reads.
3. Why is solution different from previous work?
The contribution of this paper is not terribly new protocols, but that of building and managing gigantic-scale systems. That said, there is a key difference – this storage system is actually tuned for write-intensive workloads. That is a significantly harder problem than read-intensive workloads in terms of guaranteeing availability and maintaining consistency at the same time. Unlike the Google papers, this one seems to favor the classical decentralized design as opposed to a simpler centralized system.
4. Does the paper (or do you) identify any fundamental/hard trade-offs?
They trade-off consistency for availability, while achieving eventual consistency. This is because of write-intensive workloads. The other trade-off is in terms of how transparent the system is to the applications. Dynamo expects applications to be intelligent and deal with inconsistency in the manner they deem fit. This represents a shift towards applications becoming more complex and would have a significant impact on the way they are designed.
5. Do you think the work will be influential in 10 years? Why or why not?
Sure! Dynamo represents a very good implementation of a lot of principles w.r.t. distributed systems and DB systems in terms of picking the right trade-offs and working effectively. With internet services becoming a big deal, their assumptions of workloads and solutions are all very pertinent.
6. Others:
a. While the idea of a knob for availability (R, W, N) seems good at the outset, it seemed against the general principle of this paper – I will tell you what works! I think setting these knobs is not an easy task and maybe for good reason they haven’t let out the secret of how their applications set it. But this can be crucial towards the performance of the system.