tl;dr:Gergely covers examples of companies that have carried out large scale migrations, including: (1) Box: a zero downtime data migration using a 6-step plan. (2) Pinterest: data migration using double writes. (3) LinkedIn: navigating the migration chaos when 100+ engineers were needed to write code and 600+ use cases need to be moved. And more.
tl;dr:"Understanding the ins and outs of distributed systems is important for both backend engineers and for anyone working with large-scale systems. Large-scale systems can mean systems with high load and high queries per second (QPS), storing a large amount of data, or ones built with low latency and high reliability. These systems are pretty common across both Big Tech and high-growth startups."
tl;dr:"A series in which I interpret interesting software engineering or engineering management case studies from tech companies. You might learn something new in these articles, as we dive into the concepts they contain." Includes: (1) Resilient payments systems learnings from Shopify. (2) Designing a solution to store and access millions of records by Grab. (3) The challenges of the analytics infrastructure platform team at Yelp. And more.
tl;dr:Gergely dives into: (1) Oncall philosophies across the industry. (2) Companies which pay and those that don’t. (3) How much do companies pay. (4) Companies which don’t pay. (5) Poor oncall cultures.
tl;dr:"More than 100 people would need to be hired across engineering, product and design, to staff these teams. The new teams were stack ranked by importance e.g. teams responsible for growing the supply of drivers were ranked much higher than those generating rider demand." Gergely discusses Uber's biggest engineering organizational change: creating cross-functional program teams and introducing platform teams.
tl;dr:"This article collects some openly available RFC templates and examples, and a list of companies that use such a process. I’d encourage to use these examples for inspiration. Take parts that resonate with you, experiment with them and modify them to your needs."
tl;dr:"In this issue we cover: (1) The extremes of shipping to production. (2) Typical processes at different types of companies. (3) Principles and tools for shipping to production responsibly. (4) Additional verification layers and advanced tools. (5) Taking pragmatic risks to move faster. (6) Deciding which approach to take. (7) Other things to incorporate into the deployment process.
tl;dr:Gergely covers a timeline of events, cause of the outage, what customers are saying, the impact of the outage on Atlassian’s business, learnings from this outage, and more.
tl;dr:"I am covering details from the vantage point of software engineers and engineering managers." Gergely covers how Fast able to hire engineers competing with the big tech companies, warning signs within the company as seen from an engineering perspective, the current situation within the company, and more.
tl;dr:"If you do some groundwork before starting the migration, you’ll reduce risk, gain confidence and understand the scope of the migration better." Gergely breaks the migration process into the following steps: (1) Preparation for migrations. (2) Pre-migration steps, such as monitoring and validation. (3) The migration itself, covering downtime, strategies & toolset. (4) After the migration. (5) The migration long-tail.