talk-deconstructing-the-database

https://youtu.be/Cym4TZwTCNU

Key Takeaways

talk-deconstructing-the-database#key-takeaway-splitting-process-and-perception1 2Splitting process and perception allows you focus on each individually. For process, you can ensure a transactional processing of writes. Indexing of the immutable values can happen after-the-fact. talk-deconstructing-the-database#key-takeaway-splitting-process-and-perception1 2

talk-deconstructing-the-database#key-takeaway-data-immutability-scales-readers-and-caching1 2Data immutability allows you to scale readers and have aggressive caching. talk-deconstructing-the-database#key-takeaway-data-immutability-scales-readers-and-caching1 2

talk-deconstructing-the-database#key-takeaway-distribute-reacting-to-peers1You can distribute the eventing and reacting to the "peers" talk-deconstructing-the-database#key-takeaway-distribute-reacting-to-peers1

Notes

At 1:20 paper-out-of-the-tar-pit that says complexity comes from state

At 2:20 in the paper they imagine that there was this relational model and somehow it got updated

At 2:45 he calls it process (and draws a distinction from perception)

talk-deconstructing-the-database#sql-is-declarative1At 3:20 SQL is the most declarative thing we encounter. And using declarative things is good. talk-deconstructing-the-database#sql-is-declarative1

talk-deconstructing-the-database#data-basis1At 4:20 what is the data basis talk-deconstructing-the-database#data-basis1

At 4:40 Round-trip fears cannot ask at three questions separately at

At 5:30 he talks about how we conflate are thinking about reporting and asking questions because we are afraid that with no basis in the data will change out from underneath us that we can't ask one set of questions get the results of that, and then ask another set of questions based on the results of that data laterin another round trip. The concern is that the data may have changed in between those two questions leading to the consistency which is why they prefer reporting.

talk-deconstructing-the-database#consistency-for-scalingAt 6:40 a good quote about trading off consistency for ability to scale talk-deconstructing-the-database#consistency-for-scaling

At 7:30 hierarchical data is difficult to represent in a traditional relational database

talk-deconstructing-the-database#doing-it-themselvesAt 8:20 time stamps and audits. People are doing it themselves on systems that don't really have an understanding of this talk-deconstructing-the-database#doing-it-themselves

At 9:10 a strong model for reaction and Eventing

talk-deconstructing-the-database#reactive-systems1At 9:20 we would like to build reactive systems that don't poll and can get consistent data talk-deconstructing-the-database#reactive-systems1

talk-deconstructing-the-database#indexes-are-sorted-views1That 1150 indexes are basically sorted views on the data. That is why databases are faster than flat files. talk-deconstructing-the-database#indexes-are-sorted-views1

At 12:30 it was a very special machine lots of memory etc., so you had one of them.

At 1404 who's problems are all of the cache complexity problems? Yours.

At 1530 update in place model.

talk-deconstructing-the-database#information-model1 2At 1550 "Rich Hickey: By an information model I mean the ability to store facts, to not have things replace other things in place, to have some temporal notion to what's being stored" talk-deconstructing-the-database#information-model1 2

At 16:45 30 years ago it made sense to build databases the way they were. You couldn't just say "how to store everything and "because they had these tiny little distance.

talk-deconstructing-the-database#coordination-overhead1At 1710 because they are doing update in place traditional relational databases have a huge coordination overhead talk-deconstructing-the-database#coordination-overhead1

At 1802 adopting a value oriented model has real architectural benefits.

At 1825 in order to handle process we will have to deal with novelty in memory

At 1840 a database of fact is about sucking the structure out

talk-deconstructing-the-database#row-is-not-a-fact-but-set-of-factsAt 1910 a row is not a fact it is a set of fact talk-deconstructing-the-database#row-is-not-a-fact-but-set-of-facts

talk-deconstructing-the-database#a-row-is-not-a-factAt 1950 a row is not a fact. They row maybe multiple facts, but it is not a single fact. talk-deconstructing-the-database#a-row-is-not-a-fact

talk-deconstructing-the-database#datom-datomsAt 20 a Datom is an atomic fact and Datoms are multiple facts. They called it this on purpose because of the whole datum/data pluralization issue. talk-deconstructing-the-database#datom-datoms

talk-deconstructing-the-database#hang-all-metadata-off-transactionAt 2030 you can hang all the meta-data for a transaction off of that transaction, like Providence etc. talk-deconstructing-the-database#hang-all-metadata-off-transaction

At 2050 entity attribute value transaction is the smallest unit

At 22 that characteristic of a value is that whatever was there before will never change

talk-deconstructing-the-database#fundamental-unit-of-noveltyAt 2240 what is the fundamental unit of novelty? talk-deconstructing-the-database#fundamental-unit-of-novelty

talk-deconstructing-the-database#represent-novelty1At 23 we can represent novelty as assertions or retractions of facts talk-deconstructing-the-database#represent-novelty1

talk-deconstructing-the-database#how-did-i-get-here1At 2350 if you look at a database and attempt to figure out how I got there you have no resources for doing that except for looking at the logs talk-deconstructing-the-database#how-did-i-get-here1

At 2530 how much change has come over time?

talk-deconstructing-the-database#process-vs-perception1 22620 process is different from perception. Process is how you get the data in what the transactions and indexes look like, etc. Perception is the queries you do against the data, etc. talk-deconstructing-the-database#process-vs-perception1 2

talk-deconstructing-the-database#process-vs-perception-how-can-separate1 2At 2655, we can separate process and perception because we have adopted and information model and immutability. talk-deconstructing-the-database#process-vs-perception-how-can-separate1 2

At 27 we want each application using this to be able to perceive change, to be able to get history, etc.

At 29:20 structural sharing "persistent" in functional usage

At 30:05 "durable" persistent data structure. Durable because it is on a disk

At 31:10 disk locality does not matter as memory locality

At 3210 moved away from betrays on disk to other way of doing covering in taxes on my "" storage which is I/O.

At 3315 the roots must be modifiable.

At 3350 this stored service needs a couple of things the ability to handle keyvalue stores, and to do a consistent read. That's why atomic can use Dynamo DB, or even a normal relational database. The relational database can store key values and provide a consistent read.

At 3420 the sorting of the indexes. If you have entity attribute value transaction, then sorting by them in order makes sense. When sorting at the attribute level that is a lot like a colander database.

talk-deconstructing-the-database#indexing-is-not-a-durability-problem1 2At 3550 everything that's new you are already logging. So the indexing of this data is not a durability problem. talk-deconstructing-the-database#indexing-is-not-a-durability-problem1 2

At 3620 how big table works. They accumulate a bunch of new data, then at some increment they update the index by applying that new data on top of the existing data and creating new flat files. Unlike the atomic there is no sharing there

At 3720 whenever we want to get the current state, we have to do a dynamic merge between memory and the storage

talk-deconstructing-the-database#log-novelty-to-get-durability1 2At 3750 take novelty and immediately log it. That is where you get durability. But that is not organized in a leverage-able way. talk-deconstructing-the-database#log-novelty-to-get-durability1 2

talk-deconstructing-the-database#merge-novelty-in-bulkAt 3815 occasionally something takes the new novelty and merges it. There's a lot of efficiency because it does it in bulk. talk-deconstructing-the-database#merge-novelty-in-bulk

At 3950 The transactor is the part which updates the transactions. The peers are the applications and they are very much equals in the processing.

At 4025 he talks about it needing a redundant store. He suggests one of the new fangled redundant stores.

At 4150 the high-level architectural diagram

At 43 no locality of storage, which means that everyone can read directly

At 4330 query can live anywhere you want. Each of the machines can do it on Aquarian you can put a library into your Java app.

talk-deconstructing-the-database#safe-to-cache1At 4120 what is safe to cache? Everything! talk-deconstructing-the-database#safe-to-cache1

At 46 each of the applications is bowling in there working sets pulling in there working sets. They don't need all of the data, they can just pull in the stuff they care about. It is not even them pulling in the data. It is daytomic

talk-deconstructing-the-database#merge-join1At 48:39 - "the query engine is going to do a merge join ... what you'll get is a sum of the information between the two" talk-deconstructing-the-database#merge-join1

talk-deconstructing-the-database#window-of-change1At 4910 accumulating a window of change. talk-deconstructing-the-database#window-of-change1

At 5145, as part of the process there are transaction functions. That is because asserting and refuting data is not quite enough.

At 5205 expand into new assertions and retractions

Functions can call functions and eventually everything is assertions or retractions.

At 5430 the older indexes are like garbage in garbage collection. Nobody will care about them anymore after a period of time.

At 5540 the only immutable thing in the system is when we update after creating an index and update the pointers.

At 5630 by separating the two, you get to address scalability separately. Think about read replicas, and about how this might be like six URS at 5740 you can use your existing database, and just put stuff in this format on top of it.

At 56 or around there he shows how immutability leads to no need for transactions in the reader.

At 5830 it can scale up and down with the load, like autoscaling

At 1:00:45 you can query against the entire database at and as of time. That way all of the individual queries do not have to have the timestamp in them.

talk-deconstructing-the-database#local-data1At 1:01:15 since you have all of the information locally you can lay out what-if? scenarios locally without affecting the real database. talk-deconstructing-the-database#local-data1

At 1:02:00 the peers get all novelty sent to them. They can decide what they want to react to.

At 1:03:00 the state model is epochal

At 1:04:20 the transaction is well defined. It is the function of the database value.

At 1:04:40 maybe there is a communicable recoverable basis.

At 1:05:45 less coordination leads to greater scalability.

Glossary

Referring Pages

context-propagation data-architecture-glossary

People

person-rich-hickey