talk-runaway-complexity-in-big-data

https://www.youtube.com/watch?v=ucHjyb6jv08

talk-runaway-complexity-in-big-data#human-fault-tolerance1At 3:20 and talks about human fault tolerance talk-runaway-complexity-in-big-data#human-fault-tolerance1

talk-runaway-complexity-in-big-data#data-loss-and-corruption1 2At 5:20 the worst mistakes you can make are data loss and data corruption talk-runaway-complexity-in-big-data#data-loss-and-corruption1 2

talk-runaway-complexity-in-big-data#immutable-no-index-needed-to-find-and-update1At 9:15 having immutable system means you don't have to have an index your data (because you don't have to find it to be able to update it) talk-runaway-complexity-in-big-data#immutable-no-index-needed-to-find-and-update1

At 12 the way you store, model, and query data is complected. They are fundamentally intertwined.

At 12:15 he talks about systems where those parts are disassociated and could be scaled independently.

At 17:25 he uses Apache thrift to define his schemas

Add 27:25 he talks about ElephantDB, which he wrote, and voldemort, which LinkedIn wrote

At 2830 you have the ideas of transformation and normalization, however they are disassociated

talk-runaway-complexity-in-big-data#complexity-isolation1At 32:40 he talks about the architecture as facilitating complexity isolation, because most of the complexity goes into the real-time, incremental processing. Eventually the batch layer overrides that. And the batch layer is much more simple. talk-runaway-complexity-in-big-data#complexity-isolation1

talk-runaway-complexity-in-big-data#eventual-accuracy1At 35:20 he talks about eventual accuracy. Basically, what it means is that the batch layer will eventually override the realtime system talk-runaway-complexity-in-big-data#eventual-accuracy1

talk-runaway-complexity-in-big-data#storage-vs-querying1 2 3At 39:20 he talks about how the data storage is separated from the querying, and how that helps with the normalization and transformation. You get a fully normalized schema in the storage, but you also get views that are optimized (potentially very de-normalized) for your queries (the batch views) talk-runaway-complexity-in-big-data#storage-vs-querying1 2 3

talk-runaway-complexity-in-big-data#change-batch-viewsAt 39:45 it gives you flexibility in your batch views. If you realize you need a very different set of batch views, you can change your recompute algorithm or target and create totally new batch views. These batch views could even live on a completely different new system. talk-runaway-complexity-in-big-data#change-batch-views

talk-runaway-complexity-in-big-data#changing-needs-new-systemsAt 40:05 your needs change over time. As long as you are able to use a function on all of your data to recompute badge is, you will be able to satisfy your future needs as well. talk-runaway-complexity-in-big-data#changing-needs-new-systems

At 44:38 is the same idea as event sourcing sourcing

data-architecture-glossary#glossary
 

Referring Pages

data-architecture-glossary new-data-architecture

People

person-nathan-marz