He talks about sending a browser a ton of random flexbox cases and determining where the elements ended up, and ensuring that his implementation ended up doing the same thing.
At 29:57 he talks about how they upgrade (side by side testing): https://www.youtube.com/watch?v=naTRzjHaIhE&t=29m57s
He talks about saving the HTTP responses returned by the current version of Rails and diffing them with the HTTP responses returned when they update the version of Rails. It's a black box test, where the externally facing behavior of the current version of the app becomes the gold master, and the new version of the app must have the same external behavior.
"This kind of aggressive technical debt cleanup and optimization work can only happen as by-product of our engineering ecosystem. Without being able to deploy the main application more than 60 times in a day, and without the tooling to automatically test and benchmark two wildly different implementations in production, this process would have taken months of work and there would be no realistic way to ensure that there were no performance or behavior regressions."
Related Articles:
talk-easy-rewrites-with-ruby-and-science
talk-fearlessly-refactoring-legacy-ruby