Navigating The Scale: How Design Patterns Power LinkedIn’s Infrastructure
- Saira Khanum tl;dr: “We’ve found the Producer-Consumer pattern to be exceptionally effective in reaching these goals. This pattern has been successfully implemented in several of our core infrastructure systems, including the distributed server query system, server console monitoring, and network security monitoring. In this process, we have identified and built general solutions that are repeatable in similar environments, greatly improving engineering efficiency by leveraging proven methodologies.”featured in #567
Faster Continuous Integration Builds At Canva
tl;dr: In April 2022, the average time for a PR to pass continuous integration and merge into our main branch was around 80 minutes. As shown in the following diagram, we’re now getting our build times down below 30 minutes, as low as 15 minutes. This post shares what we’ve done to improve CI build times in our main code repository, including: (1) Finding the best opportunities (2) Experimentation (3) Deliver fast and incrementally (4) The importance of everyone’s contributions.featured in #537
Google Zanzibar For The Rest Of Us
- Greg Sarjeant tl;dr: Google Zanzibar powers authorization for hundreds of Google’s apps so you might think it's a great model for your authorization service. But does Zanzibar's promises of scale, high availability, strong consistency mean that it’s the right solution for the rest of us? Zanzibar's defining characteristic is actually centralization, which is a massive tradeoff that’s not practical for most. The Googles of the world can pull it off, but is there a Zanzibar for the rest of us?featured in #497
Google Zanzibar For The Rest Of Us
tl;dr: Google Zanzibar powers authorization for hundreds of Google’s apps so you might think it's a great model for your authorization service. But does Zanzibar's promises of scale, high availability, strong consistency mean that it’s the right solution for the rest of us? Zanzibar's defining characteristic is actually centralization, which is a massive tradeoff that’s not practical for most. The Googles of the world can pull it off, but is there a Zanzibar for the rest of us?featured in #492
Google Zanzibar For The Rest Of Us
tl;dr: Google Zanzibar powers authorization for hundreds of Google’s apps so you might think it's a great model for your authorization service. But does Zanzibar's promises of scale, high availability, strong consistency mean that it’s the right solution for the rest of us? Zanzibar's defining characteristic is actually centralization, which is a massive tradeoff that’s not practical for most. The Googles of the world can pull it off, but is there a Zanzibar for the rest of us?featured in #490
featured in #488
Switching Build Systems, Seamlessly
- Patrick Balestra tl;dr: Patrick chronicles Spotify's shift to Bazel. The move was driven by the need for a scalable build system for their growing codebase. The transition, which began in earnest in 2020, involved running two build systems side by side, adapting existing tools, and extensive testing. By 2023, the iOS Spotify app was fully built with Bazel, resulting in significant improvements in build times and developer experience.featured in #461
featured in #460
How GitHub Indexes Code For Blazing Fast Search & Retrieval
- Shivang Sarawagi tl;dr: “The search engine supports global queries across 200 million repos and indexes code changes in repositories within minutes. The code search index is by far the largest cluster that GitHub runs, comprising 5184 vCPUs, 40TB of RAM, and 1.25PB of backing storage, supporting a query load of 200 requests per second on average and indexing over 53 billion source files.”featured in #458
Executing Cron Scripts Reliably At Scale
- Claire Adams tl;dr: Claire discusses the challenges of managing and executing cron scripts in a reliable manner within large-scale infrastructure. “The Job Queue is an asynchronous compute platform that runs about 9 billion “jobs” or pieces of work per day.“ Claire provides insights into techniques such as distributed execution, retries, and monitoring to ensure the dependable execution of cron jobs at scale, highlighting the need for a systematic approach to handle failures effectively.featured in #457