https://www.youtube.com/watch?v=2STfulBcorA&t=6672s
The talk really starts getting going at 1:18:40
At 1:24:30 the batching his time based not size based.you can control the batch interval.
At 1:24:50 when the batch interval concludes it becomes an RTD and it sent down to spar core for processing
At 1:27:10 Kafka can level out the back pressure if you have a hyper just low pursed scenario. Hi burst burst your bubble
At 1:27:40 since the batches become our DVDs, they are distributed. That is what the first D is in our DD distributed
At 1:28:00 in spark streaming you work on something called D streams which are discrete streams.
At 1:28:58 each do you stream is comprised of a sequence of our DD's
At 1:29:45 the transformation to do to a D stream basically stamp the DAG on to every RDD in the D St.
At 1:30:00 turn the template I just operations on the D stream are used to add the operations to the RTD.
At 1:30:00 to go if you want to maintain his Storico data then you take the initial stream and filter the data into Cassondra for later analyzing
At 1:34:10 do you streams and I'll put new DA streams
At 1:34:15 you could is something that is an escape patch to put strap up into a data frame which would then allow you to get it into hive, for example.
At 1:35:10 how do you distribute a batch in to partitions that are within a resilient distributed data set, or our DD?
At 1:36:45 there is also a block interval. There is a diagram that shows how the blocks are processed the blocks are the unit of partition. By default they are 200 ms
At around 1:38:00 a visualization of how stuff to do is distributed on to the node's notes multiple
talk-intro-to-apache-spark-streaming#do-as-little-as-possible1At 1:42:20 do as little as possible and jam that stuff in to Cassandra as quickly as you can talk-intro-to-apache-spark-streaming#do-as-little-as-possible1
At 1:45:00 talking about UDP in spark