https://www.youtube.com/watch?v=a4w6MXKv0Cw
At 4:30 the guiding principles for optimizing their data architecture for scalability developer productivity, and for correctness
At 7:30 LinkedIn is using Kafka to handle over 1 trillion events per day.
talk-building-robust-and-scalable-data-pipelines-with-kafka#schema-is-very-very-important1 2At 9:50 schema is very very important. talk-building-robust-and-scalable-data-pipelines-with-kafka#schema-is-very-very-important1 2
At 11:40 he talks about the schema be in the message or out of the message like At 17:10 in-band and out-of-band schemas (#)
talk-building-robust-and-scalable-data-pipelines-with-kafka#schema-enforce-at-produce-time1 2 3At 11:55 you need to enforce schemas at produce time talk-building-robust-and-scalable-data-pipelines-with-kafka#schema-enforce-at-produce-time1 2 3
At 13:30 you need to know what uniquely identifies a record
At 15:05 they have a separate service for getting the schemas and stream meta-data
talk-building-robust-and-scalable-data-pipelines-with-kafka#streams-used-for-many-purposes1
At 16:50 what are the various things you can do with the streams of data?
talk-building-robust-and-scalable-data-pipelines-with-kafka#streams-used-for-many-purposes1
At 17:40 Twilio FS is the data lake
At 19:20 copycat copies all of the Kafka data to S3. This allows the re-running of data from scratch (even if they made a mistake in Copydog)
At 19:45 copydog dedups and ensures the correct ordering in the data, but otherwise leaves the data alone. Copydog's output is what is used by other parts of the system.
At 20:10 S3 does not have an atomic move functionality, so they his version and file names and a reference to the latest version. Like At 36:34 Riak and S3 do not have the consistency or transitional semantics or was it a transactional semantics? (#)
At 21:05 the metadata API helps.
At 21:50 each there are multiple red shift clusters, used for different purposes.
At 23:40 use spark for real time processing from streams
talk-building-robust-and-scalable-data-pipelines-with-kafka#full-row-checksums1At 25:05 use full-row checksums full-row-checksums to see if values align talk-building-robust-and-scalable-data-pipelines-with-kafka#full-row-checksums1
At 26:30 "we try out new data stores all the time"