talk-what-i-wish-i-had-known-before-scaling-uber

https://news.ycombinator.com/item?id=12597232

talk-what-i-wish-i-had-known-before-scaling-uber#stability-of-service-leave-alone1 2 At 5:38 A surprising benefit of microservices is that you never touch them. You can leave many of them alone after deploying them. That sounds good, but the flip side, which he talks about at 39:30, is that when someone needs to make a cross-cutting change, they might be forced to update six month old code on a service that hasn't changed to keep up with the rest of the services (since it was working fine) talk-what-i-wish-i-had-known-before-scaling-uber#stability-of-service-leave-alone1 2

Teams "own their own uptime"

talk-what-i-wish-i-had-known-before-scaling-uber#trade-complexity-for-politics1 At 9:01, you might trade complexity for politics. Basically, you build a new service because you don't want to deal with talking to other people about how bad the old code is. talk-what-i-wish-i-had-known-before-scaling-uber#trade-complexity-for-politics1

talk-what-i-wish-i-had-known-before-scaling-uber#keep-your-biases1 At 9:30 with multi-language microservices you get to keep your biases. If you like a specific language, you can use it, and interface with other people using other languages. talk-what-i-wish-i-had-known-before-scaling-uber#keep-your-biases1

talk-what-i-wish-i-had-known-before-scaling-uber#fragment-culture1 Having microservices with multiple languages fragments the culture. People say "oh I am a go programmer, oh I'm a java a programmer." talk-what-i-wish-i-had-known-before-scaling-uber#fragment-culture1

talk-what-i-wish-i-had-known-before-scaling-uber#json-a-mess-at-scale1 At 13:30, he starts talking about HTTP between services, and some of the issues with it. He specifically calls out the JSON, because it does not have types, it can be a big mess at scale. talk-what-i-wish-i-had-known-before-scaling-uber#json-a-mess-at-scale1

talk-what-i-wish-i-had-known-before-scaling-uber#if-you-own-it-make-it-a-function-call1 At 16:03 he finishes talking about that by saying that if you own both sides of the interaction, just treat it as a function call, don't treat it as the server being, basically, a web browser. talk-what-i-wish-i-had-known-before-scaling-uber#if-you-own-it-make-it-a-function-call1

At 19:30 is your automation good enough that other teams can deploy to your service, or do they need to wait on you?

talk-what-i-wish-i-had-known-before-scaling-uber#same-dashboard1 At 22, every service should have the same dashboard, and it should be created automatically. talk-what-i-wish-i-had-known-before-scaling-uber#same-dashboard1

talk-what-i-wish-i-had-known-before-scaling-uber#distributed-tracing1 2 At 26:40 using distributed tracing to figure out issues and fan out.

At 31:20 tracing requires cross-language context-propagation. talk-what-i-wish-i-had-known-before-scaling-uber#distributed-tracing1 2

talk-what-i-wish-i-had-known-before-scaling-uber#distributed-tracing-requires-context-propagation1 At 31:20 tracing requires cross-language context propagation talk-what-i-wish-i-had-known-before-scaling-uber#distributed-tracing-requires-context-propagation1

At 33:10 starting to put back pressure in the logs, so logs are dropped if something starts logging too much.

talk-what-i-wish-i-had-known-before-scaling-uber#accounting-in-logs1 2 At 30 for some kind of accounting for the logs talk-what-i-wish-i-had-known-before-scaling-uber#accounting-in-logs1 2

talk-what-i-wish-i-had-known-before-scaling-uber#structured-logging1 At 34:50 zap for structured logging open source talk-what-i-wish-i-had-known-before-scaling-uber#structured-logging1

At 35:20, there's no way to create a test environment that's the same as production, and there is no way to simulate the same load as production

talk-what-i-wish-i-had-known-before-scaling-uber#load-testing-in-production1 At 36:05 load testing on production during slow times, need context-propagation. The request must tell the system that it is a test request, and that it should not increment the counters, for example. talk-what-i-wish-i-had-known-before-scaling-uber#load-testing-in-production1

talk-what-i-wish-i-had-known-before-scaling-uber#design-systems-with-test-load-in-mind1 2 At 36:50 since many of their bugs show up when they are near peak traffic, they like to use their test traffic to keep them selves near their peak load. They wish they had designed their system to have that be a fundamental part. talk-what-i-wish-i-had-known-before-scaling-uber#design-systems-with-test-load-in-mind1 2

talk-what-i-wish-i-had-known-before-scaling-uber#failure-testing-a-prerequisite-part-of-design1 2 At 37:45 failure testing should be built-in from the start. Nobody wants to hold it on after the fact. talk-what-i-wish-i-had-known-before-scaling-uber#failure-testing-a-prerequisite-part-of-design1 2

At 39:30 the problem with micro services that have been deployed and not change for a long time because they're working, is that occasionally someone wants to come along and make the cross cutting change and the micro service is very far back and the migration cost is increased because of that.

talk-what-i-wish-i-had-known-before-scaling-uber#migration-mandates1 2 At 40:15 Mandates to migrate are bad. Rather, the new systems should be so much better that people want to get on it. talk-what-i-wish-i-had-known-before-scaling-uber#migration-mandates1 2

talk-what-i-wish-i-had-known-before-scaling-uber#build-buy-tradeoff At 41:15, the build/by trade off talk-what-i-wish-i-had-known-before-scaling-uber#build-buy-tradeoff

talk-what-i-wish-i-had-known-before-scaling-uber#breaking-up-allows-people-to-play-politics At 42:50, by breaking these services up it allows people to play politics. talk-what-i-wish-i-had-known-before-scaling-uber#breaking-up-allows-people-to-play-politics

At 4440, there are trade-offs being made, and sometimes things would just move in a direction, but he wouldn't be thinking about explicitly what the trade-offs are that were being made.

talk-what-i-wish-i-had-known-before-scaling-uber#failure-testing-find-coupling1 At 45:50, use failure testing to identify unintended service coupling. talk-what-i-wish-i-had-known-before-scaling-uber#failure-testing-find-coupling1

Referring Pages

microservices context-propagation http-vs-rpc distributed-computing-metrics-and-logging data-architecture-glossary

People

person-matt-ranney