Hundreds of services in production, number changing all the time
General point: everything is a tradeoff. Try to make them intentionally
Upsides
Allows teams to release independently
Teams own their own uptime
Lets you pick the best tool for the job
Downsides
Managing a distributed system
Everything is an RPC
If using HTTP, need standards for how it’s used, or else a lot of incidental complexity to understand how another API works. Servers are not browsers: just set it up to call a function
Calls are slow
Static types would be nice
What if it breaks?
How do you make sure the right people are paged for the root of the failure?
If you own your uptime, you can block other teams if they need a fix in your service
Can they release your service?
Temptation to build around problems instead
Specifically, political/organizational problems
You get to keep your biases instead of learn
Using separate languages means
Hard to share code
Hard to contribute fixes to other codebases
Hard to move between teams
Fragments the org culture into tribes
More difficult to understand the service in the larger context, can’t see all in one place
Teams set up separate dashboards for status
Tracking performance
Each programming language has its own different tools
Work needed to get them into a consistent format (is that OSS now?)
Fanout: the latency of the slowest step sets the speed for your whole system (I guess it refers to services calling many other services)
A service can be fast, but if another service needs to make many individual calls to it, the overall speed is slow
Have to be able to trace through all services
There are tools
Cross-language context propagation. At least pass a common ID and log it. Ideally other fields passed along transparency
The overhead slows things down, so do sampling of a portion of the requests
Need consistent logging
Need to give them tools to do it consistently
Logging floods can amplify problems
Load testing
Have to test against production, without breaking metrics, all the time
Have to have context that tells services that it’s a test request
Keep systems near their peaks all the time
Have to do failure testing (chaos monkey etc)
Migrations: old stuff still has to work
Things can never be fully immutable: at some point you’ll have to make a cross-cutting change. A security patch if nothing else
Mandates are bad. Use carrots, not sticks. Unless it’s for security or compliance
Services allow people to put their own interests or team’s above the company’s
Considerations
Have to decide how many repos: one vs many
Performance doesn’t matter until it does
The build/buy tradeoff is hard: you built it but now it’s a commodity