The Canva Outage: Another Tale Of Saturation And Resilience
- Lorin Hochstein tl;dr: “Today’s public incident writeup comes courtesy of Brendan Humphries, the CTO of Canva. Like so many other incidents that came before, this is another tale of saturation, where the failure mode involves overload. There’s a lot of great detail in Humpries’s write-up, and I recommend you read it directly in addition to this post.”featured in #581
Bottleneck: Resilience And Observability
- Punit Lad Carl Nygard tl;dr: The authors delve into the intricacies of resilience and observability in the context of rapidly scaling systems. As systems expand, their complexity can lead to potential failures. Resilience isn't about averting these failures but adeptly managing them. Observability is pivotal for comprehending system behavior, with its three foundational pillars: Metrics, Logs, and Traces. The authors also highlight challenges posed by the vast data volume in observability and the role of automation.featured in #442