https://www.youtube.com/watch?v=oOON--g1PyU
Presentation notes: https://github.com/stuarthalloway/presentations/wiki/Simplifying-ETL-with-Clojure-&-Datomic
related to talk-ability-and-robustness-clojure-spec and Alex mentioned wiki-design-by-contract
talk-etl-with-clojure-and-datomic#enumerating-things-etl-might-be1At 2:40 he starts talking about all the things that ATL might actually be. More than just the three steps. talk-etl-with-clojure-and-datomic#enumerating-things-etl-might-be1
At 3:35, the exact steps don't matter what matters is how you compose them together. A functional pipeline
At four minutes hey Pier function of dated today the followed by a pair function of data to data etc.
At 4:33 Pure functions all out for easy paralyzation, if you decide to need checkpoints for your data because intermediate functions will fail, then you are in a good state.
One of the advantages to using closure at 4:55 is a data oriented approach to these problems
At 5:37 systemic generality. A small number of functions that can operate on lots of different kinds of data.
At 5:55 pairs and exceptions are treated as information errors and information
At 6:30 and most of the other languages encourage "encapsulated specificity and "
At 6:30 he downplays the static versus dynamic, and up plays the systemic generality versus encapsulated specificity.
At 8:09 there is a spec comparison table
At 8:15 it is all about reach, which is being in multiple things at the same time. By definition you are in three places the source data your code and the output.
At 9:04 "I have no idea what key is this map should have "
At 9:58 no 945 he talks about maps back and what needs to be in the data. He basically created characterization specs. He liked that it was dynamic because it allowed him to add just enough specificity to feel comfortable that he understood what was going on in the system.
At 10:45 no 1030 qualified keywords.
At 11:20 namespace qualified keywords would have gotten him more leverage. Somehow spec automatically discovers them.
Add 13 transducers describe input and output. No they do not. They describe the transformation separate form from considerations of input and output.
At maybe 1440 having to write this stuff as transducers lead to emergency room for city because the "stupid stuff "became more obvious when the boilerplate was removed.
At 1719 Java has checked and unchecked exceptions which shows a type of virus. Chavez exceptions are hard to group into exceptions like this was caused by another system, or this is caused by Michael Low, this is actionable, etc.
At 1745 he didn't want to make the Java exception model the centerpiece for "errors as information "
At 2010 he prefers a decision by us. There is a list of decisions.
At 2205 do not change the meaning of a name in a name space. Create a new name space.
talk-etl-with-clojure-and-datomic#reified-transactions1 2 At 2310 reified transactions transactions are records of what you did in the system.
At 23:50 they allow you to easily track ETL (or backfill) progress
At 2450 the section on reified transactions ends. talk-etl-with-clojure-and-datomic#reified-transactions1 2
talk-etl-with-clojure-and-datomic#reified-transactions-establish-provenance1 2At 23:30 the attributes you add to reified transactions establish provenance talk-etl-with-clojure-and-datomic#reified-transactions-establish-provenance1 2
At 2601 the benefit of a universal schema is no joy in tables.
At 2630 Cassondra best practice is to create a table for each of the queries that you were going to make, so you have to think ahead of time with your ATL to create that table.
At 30 the need is to limit my stupidity two small local scopes. That is why we want boundaries.
At 2726 back has regular expressions on data, not just strings.
At 2845, you will still pass around maps. But when you want to get specific you can talk about your maps.
At 3030 it talks about function semantics. You can semantically define the return types.
At 3110 "predicates of the behavior of the system"
At 3350 conformance shows exactly how this data conforms to the spec.
At 3510 exercise creates example data and also shows how it conforms to the spec.
At 3540 you can eyeball exercised data. The example is in grading. You can look at the grades it created to see whether they make sense.
At 3730 to the generative testing automatically shrinks down to the smallest failing in case it can find.
At 38 spec is doubling down on the dynamic.