https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
Related to the work in paper-amazon-dynamo
p 1: dynamic control over data layout and format
p 1: Cassandra uses a synthesis of well known techniques to achieve scalability and availability
p 1: Bayou[18] is a distributed relational database system that allows disconnected operations and provides eventual data consistency
paper-cassandra#strong-consistency1 2p 1: Although strong consistency provides the application writer a convenient programming model, these systems are limited in scalability and availability [10] paper-cassandra#strong-consistency1 2
paper-cassandra#structured-overlayp 2: Dynamo can be defined as a structured overlay with at most one-hop request routing. paper-cassandra#structured-overlay
paper-cassandra#dynamo-write-requires-a-read1p 2: A write operation in Dynamo also requires a read to be performed for managing the vector timestamps. This is can be very limiting in environments where systems need to handle a very high write throughput. paper-cassandra#dynamo-write-requires-a-read1
p 2: Every operation under a single row key is atomic per replica no matter how many columns are being read or written into
paper-cassandra#two-kinds-of-column-familiesp 2: Cassandra exposes two kinds of column families, Simple and Super column families paper-cassandra#two-kinds-of-column-families
p 2: The system allows columns to be sorted either by time or by name
p 2: Typically applications use a dedicated Cassandra cluster and manage them as part of their service. Although the system supports the notion of multiple tables all deployments have only one table in their schema.
paper-cassandra#core-distributed-systems-techniquesp 2: Describing the details of each of the solutions is beyond the scope of this paper, so we will focus on the core distributed systems techniques used in Cassandra: partitioning, replication, membership, failure handling and scaling. paper-cassandra#core-distributed-systems-techniques
p 2: Cassandra partitions data across the cluster using consistent hashing [11] but uses an order preserving hash function to do so
paper-cassandra#coordinatorp 3: The coordinator is in charge of the replication of the data items that fall within its range. In addition to locally storing each key within its range, the coordinator replicates these keys at the N-1 nodes in the ring paper-cassandra#coordinator
p 3: The metadata about the ranges a node is responsible is cached locally at each node and in a fault-tolerant manner inside Zookeeper
paper-cassandra#preference-listp 3: We borrow from Dynamo parlance and deem the nodes that are responsible for a given range the "preference list" for the range. paper-cassandra#preference-list
p 3: Cassandra provides durability guarantees in the presence of node failures and network partitions by relaxing the quorum requirements as described in Section5.2.
paper-cassandra#preference-list-has-nodes-in-multiple-datacentersp 3: Cassandra is configured such that each row is replicated across multiple data centers. In essence, the preference list of a key is constructed such that the storage nodes are spread across multiple datacenters. paper-cassandra#preference-list-has-nodes-in-multiple-datacenters
paper-cassandra#membership-anti-entropy-using-gossip1p 3: Cluster membership in Cassandra is based on Scuttlebutt[19], a very efficient anti-entropy Gossip based mechanism. paper-cassandra#membership-anti-entropy-using-gossip1
paper-cassandra#membership-and-gossip-protocolp 3: The token information is then gossiped around the cluster. This is how we know about all nodes and their respective positions in the ring. This enables any node to route a request for a key to the correct node in the cluster paper-cassandra#membership-and-gossip-protocol
p 3: A node outage rarely signifies a permanent departure and therefore should not result in re-balancing of the partition assignment or repair of the unreachable replicas
paper-cassandra#manual-addition-and-removal-of-nodes-from-clusterp 3: For these reasons, it was deemed appropriate to use an explicit mechanism to initiate the addition and removal of nodes from a Cassandra instance paper-cassandra#manual-addition-and-removal-of-nodes-from-cluster
paper-cassandra#bootstrap-transferp 4: We are working on improving this by having multiple replicas take part in the bootstrap transfer thereby parallelizing the effort, similar to Bittorrent. paper-cassandra#bootstrap-transfer
paper-cassandra#commit-log-and-in-memory-data-structurep 4: Typical write operation involves a write into a commit log for durability and recoverability and an update into an in-memory data structure paper-cassandra#commit-log-and-in-memory-data-structure
paper-cassandra#sequential-and-indexp 4: All writes are sequential to disk and also generate an index for efficient lookup based on row key paper-cassandra#sequential-and-index
paper-cassandra#maintain-column-indiciesp 4: In order to prevent scanning of every column on disk we maintain column indices which allow us to jump to the right chunk on disk for column retrieval paper-cassandra#maintain-column-indicies
paper-cassandra#task-pipeline-stagesp 4: Each of these modules rely on an event driven substrate where the message processing pipeline and the task pipeline are split into multiple stages along the line of the SEDA[20] architecture paper-cassandra#task-pipeline-stages
paper-cassandra#purgingp 4: In any journaled system there needs to exist a mechanism for purging commit log entries. paper-cassandra#purging
paper-cassandra#set-a-bit-in-commit-logp 4: Every time the in-memory data structure for a particular column family is dumped to disk we set its bit in the commit log stating that this column family has been successfully persisted to disk. paper-cassandra#set-a-bit-in-commit-log
paper-cassandra#normal-or-fast-sync-modep 4: The write operation into the commit log can either be in normal mode or in fast sync mode. In the fast sync mode the writes to the commit log are buffered. paper-cassandra#normal-or-fast-sync-mode
paper-cassandra#check-in-memory-data-structure-firstp 4: A typical read operation always looks up data first in the in-memory data structure paper-cassandra#check-in-memory-data-structure-first
p 5: Periodically a major compaction process is run to compact all related data files into one big file