context-propagation

Think about our retailer tooling that shows the retailer nothing about how we arrived at our decision. No context is propagated to them, so they have to speculate, work though intermediaries, take up a whole bunch of time communicating, etc.

Most of the systems we build our not performance bound. It can make sense for return values from a function to not just include the return value that you absolutely need, but also some additional info about it , For example how it reached the conclusion, maybe some internal details about what happened. That is the context that can be helpful when debugging the system. are use the term debugging here loosely dividing can even mean simply understanding.

intention revealing

video-nova-crash-of-flight-111

Sections

To do intent. Event sourcing captures intent. User stories capture intent. Should the user story be late? Should a admin interface show all the user stories associated with it?

tools-that-give-guidance

What do I mean?

Why?

people are lazy... make it easy for them to find what they need so they don't speculate

Structure of the data

Not the system of record and not derived, separate metadata. (reified transactions)

Show causality through sideband propagated context

A system component that chooses to retain causality in its events has two options: repeat causal inputs in the messages it sends outward; or record the causal inputs and pass a key in the messages it sends outward. (#)
talk-what-i-wish-i-had-known-before-scaling-uber
Cells' values are propagated from the results of other cells, but they also carry the metadata of how they achieved that result. (#)

Decisions

Recording decisions along with the information available at the time lets us evaluate those decisions later, when outcomes are known. (#)

Interfaces with flexibility in the amount of displayed data

Other

One of the reasons why open source has gained popularity is that it is not a black box. Even if it doesn't log, you can ultimately reason about what it's doing by adding your own logging or running through a debugger Free software advocates have long known that if you can't inspect a system, you're held prisoner by it. Yet this applies not just to the layers that programmers currently code on, but also into new and more abstract frontiers. A black box that you can't ask to explain itself is a dangerous and probably poorly operating device or system. (#)

Sean's feedback about the continuous deploys system was that it wasn't clear all the thought that went into the system. It was good to provide visibility, but the scenarios should have been part of the code and shown directly in the interface (and tested for).

The context of intent. Like in event sourcing, how you attempt to capture the intent, rather than the individual operations that are performed.

Location: 12,066 For example, storing the event "student cancelled their course enrollment" clearly expresses the intent of a single action in a neutral fashion, whereas the side effects "one entry was deleted from the enrollments table, and one cancellation reason was added to the student feedback table" embed a lot of assumptions about the way the data is later going to be used. If a new application feature is introduced — for example, "the place is offered to the next person on the waiting list" — the event sourcing approach allows that new side effect to easily be chained off the existing event. (#)

TODO - column maps. Put the mapping with the data that it maps, and use it for the actual processing of the data (like how Avro has the schema built in)

Allows you to recover from failure. Expected failures (RDDs and divergent versions in Dynamo), but also unexpected failures (like when X mentioned they had to replay their logs to get back the transactions from the last few years)

start/end bullet points
- Captures Intent
  - Intent of user
- intent of the system
- Captures Observations and Decisions
- At hand
  - Make it very easy to reason about quickly
- What about the size of data?
  - A tradeoff
  - Systems like Druid that scale horizontally and opportunistically move data to cheap storage
- Clearly not primary data
  - Do not build systems that permanently care about things that are passed as context
- Accountable
  - Observation
  - Decision (and reason for decision)
  - Consequences
  - At some point Sussman expressed how he thought AI was on the wrong track. He explained that he thought most AI directions were not interesting to him, because they were about building up a solid AI foundation, then the AI system runs as a sort of black box. "I'm not interested in that. I want software that's accountable." Accountable? "Yes, I want something that can express its symbolic reasoning. I want to it to tell me why it did the thing it did, what it thought was going to happen, and then what happened instead." He then said something that took me a long time to process, and at first I mistook for being very science-fiction'y, along the lines of, "If an AI driven car drives off the side of the road, I want to know why it did that. I could take the software developer to court, but I would much rather take the AI to court." (I know, that definitely sounds like out-there science fiction, bear with me... keeping that frame of mind is useful for the rest of this.) (#)
- Correctness
  - Git uses merkle trees and lineage to be able to prove no corruption
  - Lineage can also help you recover from dataflow faults, like in Spark (the RDDs)

To do look into quotes in understandability-as-a-design-goal.

Think about how the product sources page did not show empty strings versus Knowles.

There is that blog post that talks about what type of blog post you are ready. Read it again to decide what this presentation is.

Also think about the information that was not showing in the product sources page that made it difficult to reason about how the decisions were made under the covers, which made it difficult for cat ops to know whether the information being displayed was correct or not.

Think about displaying variable amounts of information and talk about how the dead bug fly about how the boat flag is not good enough.

One thing in the quality log message is that machines will never rely on the context Notice how the 'cause' is just a long string. There is a lot of context, but it is meant for human consumption, not for a machine to act upon (#)

I've also been thinking about how propagating intermediate results can help, too. For example, if you are presenting a decision to another system, include some of the rationale for the decision, too.

At hand

Think about the laziness of people and how they will not search out data that is not close at hand

context-propagation-at-hand#at-hand

source code
- Not at hand, but better than nothing: Free software advocates have long known that if you can't inspect a system, you're held prisoner by it. Yet this applies not just to the layers that programmers currently code on, but also into new and more abstract frontiers. A black box that you can't ask to explain itself is a dangerous and probably poorly operating device or system. (#)
- If source data is is time-versioned, code is versioned, and not reliant on side effects, you can re-discover context.
logs
- all in one place
- limited query-ability
- schema-on-read if you have a schema parser like in Loggly
within database
- readily queryable along OLTP data (easy to join or filter by)
separate event store like Druid
- great queryability (like SQL)
- horizontally scalable (and separate components are independently scalable based on need)
- SQL connector and charting into things like Tableau or Superset

We had a page that explained how to reason about continuous deploys, but it was only after putting the information into a nice web interface that the questions about what was going on stopped.

We need to make sure that the context is never programmatically relied upon

Pull the actual quote from the Huber talk and put it in here

Make it very easy to reason about quickly (like that adventure racing thing, and also people not wanting to be bothered about looking into logs and just speculating about CD)

Have a Quip where I have a screenshot of Product::Caches helping to debug a problem because of the context it has: https://instacart.quip.com/FjuSARe0TOo9

At 55:02 Bret Victor: "we need to continuously experience the meaning of our meanings" (#)

At 55:50 context propagation (and the visualization) allows him to go from the final pixel to the original very easily. Layers of insight, too. (#)

It's more data. Yes. It's cached. Yes. It should not be relied upon (critical... do not build systems that rely, long-term, on the data). Yes.

A good example from our own code. Persist the information about the decision that led to the result in a product source match. Note how it's more data... yes.

Catalog::MatchProductService.new(retailer_id: Product::Source.find(34597296).retailer_id, code_set:{lookup_code: "00028400055987"}).perform
{
  :match_details => {
    :code_id      => nil,
    :code_literal => "00028400055987",
    :code_type    => "ScanCode",
    :match_logic  => "matched_by_universal_product_classified_code"
  },
  :product_id    => 16848067,
  :result        => true
}

Are intermediate-results a type of context propagation? They can be, for example, if they are decisions.

ETL-able means you provide enough context in the data to make the reduction process possible... you propagate your decisions, too. Is this right? Or, rather, should your decisions be raw data, like the update we'd make to priority in Ruby code.

Reflection

"Chain of reasoning"

Think about yeah give me, you are not going to need it, and whether that applies to data. How about you might need it?

In the same way that you try to anticipate the needs of the future reader of your log messages, imagine trying to fight the same context into the context that you propagate.

In the ritual he talked he says that we presume the best . What if we presume that things are going to fail. What information would downstream services require to be able to either automatically make choices better, or enable a human operator to make better choices?

Can be as simple as ensuring that you show nulls differently from empty strings in a view. Doesn't seem like context propagation, but kinda is.

At hand. Audits in the admin interface versus needing to go looking for logs or using the consul

Having the data and helps others collaborate with us. Catalog ops can't go off and find the log messages very easily.

book-designing-data-intensive-applications#record-state-of-what-was-seen

If you can log an event to record the state of the system that the user saw before making a decision, and give that event a unique identifier, then any later events can reference that event identifier in order to record the causal dependency [ 4 ]. We will return to this idea in "Reads are events too" .

Git commit messages - that "I kill puppies" message that wasn't very helpful...

This is sort of in-flight context propagation. It slows down the calculations and shows intermediate results... highlighting its decision making process At 53, the software shows where it would like to take a peek, and where it actually decided to take a peek. (#)

You can glean context after-the-fact by looking at log files, but even in an interface like Loggly with analytics, etc, it's still so limited. Robot experts build trust in their robots by looking through log files, which is incredibly tedious. If a perceived or actual error occurs, they have to debug their robots using the same logs, which is time-consuming and prone to misunderstanding. Robot users that have no access to the logs are rarely given insight into their robot's actions. (#) . Druid is like that, but with more context and queryability.

context-propagation-spectrum#spectrum

No context
- What happens?
  - In-place updates directly on the data
- Outcome
  - When: No way to know. Things like updated_at are a bit of context-propagation, if they exist at the application code level, but direct updates to the database can make them problematic
  - Cause - Technical: No way to know.
  - Cause - Logical: No way to know.
  - Ease of understanding: Hard - need to read code, reason about when the code ran, find any existing logs which are separate from the data
Minimal context
- What happens?
  - In-place updates directly on the data with application-level audits
- Outcome
  - When: Can understand when the update happened (if it was done through the application, not if done directly in database)
  - Cause - Technical: unknown unless propagated
  - the technical cause of it, the actual reasoning for it, etc.
  - Ease of understanding: TODO

... some more steps in here ...

Maximal context
- What happens?
  - The only system-of-record data that is persisted is systematically-enforced immutable facts
  - Coincident with those facts, additional propagated context is persisted
- Potentially even from previous steps this step doesn't understand
- Easily-queryable meta-info, such as reified transactions (#) (like what version of the code was running)
- Outcome
  - When: Yes, you will know when the data was created, and it will never be accidentally changed after-the-fact
  - Cause - Technical: Associated change in the same transaction is captured, because the transaction is captured and has meta-information associated with it
  - Cause - Logical: The motivation for the change is captured, and you can join on the upstream changes to determine their motivation as well.
  - Ease of understanding: Easy. All the context is available, and effort has been put towards making it readily queryable

blog-post-hack-spark-for-data-lineage

To build that data structure, you don't need to know anything that happens to it afterwards, you just need to know what it means (#)

If you've ever tried to use a lexer & parser generator, you've probably gotten an error message like Syntax error and marveled at how it manages to not even have any clue where the syntax error was, much less what the darned problem is. What we're suffering from here is over-abstraction making things harder for ourselves (#)

Already added to presentation

There is talk in talk-what-i-wish-i-had-known-before-scaling-uber about why they needed this

Somewhat related to a-quality-log-message because in that message there is so much consideration given to the context provided by the message.

"anything you believe should have a logical chain of reasoning all the way down" (#)

Page 9: A Big Data system must provide the information necessary to debug the system when things go wrong. The key is to be able to trace, for each value in the system, exactly what caused it to have that value. (#)

At 2350 if you look at a database and attempt to figure out how I got there you have no resources for doing that except for looking at the logs (#)

Show the process and algorithms of the automation by revealing intermediate results in a way that is comprehensible to the operators. (#)

TODO - talk-understanding-and-using-reified-transactions

talk-etl-with-clojure-and-datomic#reified-transactions

At 2310 reified transactions transactions are records of what you did in the system.

At 23:50 they allow you to easily track ETL (or backfill) progress

At 2450 the section on reified transactions ends.

Referring Pages

talk-what-i-wish-i-had-known-before-scaling-uber distributed-computing-metrics-and-logging