talk-intro-to-cassandra

https://m.youtube.com/watch?v=B_HTdrTgGNs

At 13 best replication. Asynchronous. No need for leader election, etc. There are companies that are running in active active.

At 1505 visual example of righty to a single node.

At 16 the mutation is first written to the commit log, then and that is independent only, then it is then A memtable is updated

At 1721 he likes is the dead simple right path

At 1825 the memory fills up and it eventually create and SS table where it sequentially writes the data in the table to disk. Because of the sequential it is much faster.that 2050 SS table. It is a beautiful

At 20 to 30 compaction and merging the SS table segments. Are they called segments?

At 2730 they use MD five hashing to 128 bitsy

27:37 consistent hashing to a 128 bit number

27:46 Token ring

29:40 replication factor

31:34 virtual nodes (or vnodes). Mentions that it is mentioned in paper-amazon-dynamo and also talks about how the vnodes are non-adjacent.

talk-intro-to-cassandra#operation-changes-during-day1At 3750 he talks about doing downtime during the day because taking the single no it's off-line this not affect the cluster. talk-intro-to-cassandra#operation-changes-during-day1

talk-intro-to-cassandra#consistency-choice-at-read-write1At 39 the consistency level is set with every read and write so at an application level you can decide what you need for that specific call. talk-intro-to-cassandra#consistency-choice-at-read-write1

In Youtube comments:

talk-intro-to-cassandra#ssds-better-at-random-but-not-perfect1 "You are right about SSDs and this is why we recommend them. There is still a bit of overhead on the transport layer when issuing random seeks. If you were to ask for a contiguous set of blocks and compared that with something issuing pure random reads, the random reads will have a much higher 95th percentile." talk-intro-to-cassandra#ssds-better-at-random-but-not-perfect1

At 4105 a local quorum is a quorum with in the data center

At 4350 a lot of zookeeper zoo bad stories

At 4940 inserts always over right. You could do and if it exists, but that is PAX us.

At 5030 he has a three hour data modeling and seek you out talk on YouTube

At maybe 53 reversed the order in on the storage engine

At 5930 and audience question asks about the difference between Cassondra and dynamo. Dynamo uses vector clocks in for semantic conflict resolution pushes the resolution to the application, Cassondra uses real clogs and last right wins

At 1:01 30 the question to her proposes that many people do not understand the trade-offs in these systems and choose the wrong set up based on bad assumptions.

At 1:02 20 Patrick mentions a paper called your bank is not consistent and explains the banks have a profit model based on eventual consistency, namely overdraft fees.

talk-intro-to-cassandra#vector-clocks-in-place-updates1At 1:03:00 Patrick draws a distinction between logging and in-place updates. He says that vector clocks are especially useful for lots of in-place updates, whereas if you do logging it is a less necessary technique because you are not updating the same values. talk-intro-to-cassandra#vector-clocks-in-place-updates1

At 1:06 10 Cassondra is operationally very straightforward because The node's are symmetric.

References

paper-amazon-dynamo

paper-big-table

Referring Pages

data-architecture-glossary concept-app-level-consistency-choice concept-operation-changes-during-day

People

person-patrick-mcfadin