At 8:40 out of order processing
At 9:50 talks about dataflow, and how it came out of the flume project which was basically to create a plan to execute the pipelines.
At 12:20 they are trying to get Milwheel plus batch processing running in dataflow.
At 1320 he describes the Lambda architecture and talks about how the streaming semantics and consistency were not great which is why they needed the batch
podcast-oreilly-data-show-massively#summingbird-unified-apiAt 1350 Summingbird gives a single API instead of two separate APIs for the batch and streaming parts of Lambda podcast-oreilly-data-show-massively#summingbird-unified-api
At 1420 he talks about moving away from batch and streaming as terms and talk to you more about bounded and unbounded data
At 15 streaming it is a loaded term
At 1620 the talk-goodbye-to-batch at Strata in London
podcast-oreilly-data-show-massively#event-time-vs-processing-time1 2At 2140 the time is event time based as opposed to processing time based, meaning they take the time information from the event, not when it is processed. podcast-oreilly-data-show-massively#event-time-vs-processing-time1 2
At 2340 dataflow has the concept of a watermark, which is an indication of when you think the data is stable before. Up to that watermark you believe you have seen all of the data and you will see, and if any And if any additional data comes in for pre-watermark after the fact it is late and you can decide what to do with it.
At 2605 talking about consistency says to make sure your ongoing calculation is correct Spate machine failures issues network issues
At 2640 you don't acknowledge that something has been processed until it has been durably committed
At 28 when describing spark he said that it was exciting for them to review Sparky and understand it was a principled system that cared about consistency
Note, I may have been using the wrong word above. It may have been correctness and not consistency that he was talking about.
At 3650 he says solve that other problems away with solve this one with window he and triggers.
Near the end he talks about how dataflow can run on various execution engines, such as Spark.
article-the-world-beyond-batch-streaming-101
article-the-world-beyond-batch-streaming-102