Essential Reading For Engineering Leaders

Real-World Engineering Challenges #6: Migrations

#Management
#Migration

tl;dr: Gergely covers examples of companies that have carried out large scale migrations, including: (1) Box: a zero downtime data migration using a 6-step plan. (2) Pinterest: data migration using double writes. (3) LinkedIn: navigating the migration chaos when 100+ engineers were needed to write code and 600+ use cases need to be moved. And more.

featured in #359

Resiliency In Distributed Systems

#DistributedSystem

tl;dr: "Understanding the ins and outs of distributed systems is important for both backend engineers and for anyone working with large-scale systems. Large-scale systems can mean systems with high load and high queries per second (QPS), storing a large amount of data, or ones built with low latency and high reliability. These systems are pretty common across both Big Tech and high-growth startups."

featured in #355

Real-World Engineering Challenges #5

#Leadership
#Management

tl;dr: "A series in which I interpret interesting software engineering or engineering management case studies from tech companies. You might learn something new in these articles, as we dive into the concepts they contain." Includes: (1) Resilient payments systems learnings from Shopify. (2) Designing a solution to store and access millions of records by Grab. (3) The challenges of the analytics infrastructure platform team at Yelp. And more.

featured in #348

Oncall Compensation

#Leadership
#Management

tl;dr: Gergely dives into: (1) Oncall philosophies across the industry. (2) Companies which pay and those that don’t. (3) How much do companies pay. (4) Companies which don’t pay. (5) Poor oncall cultures.

featured in #340

The Platform And Program Split At Uber: A Milestone Special

tl;dr: "More than 100 people would need to be hired across engineering, product and design, to staff these teams. The new teams were stack ranked by importance e.g. teams responsible for growing the supply of drivers were ranked much higher than those generating rider demand." Gergely discusses Uber's biggest engineering organizational change: creating cross-functional program teams and introducing platform teams.

featured in #330

Software Engineering RFC And Design Doc Examples And Templates

#Management
#Leadership

tl;dr: "This article collects some openly available RFC templates and examples, and a list of companies that use such a process. I’d encourage to use these examples for inspiration. Take parts that resonate with you, experiment with them and modify them to your needs."

featured in #328

Shipping To Production

#Leadership
#Management

tl;dr: "In this issue we cover: (1) The extremes of shipping to production. (2) Typical processes at different types of companies. (3) Principles and tools for shipping to production responsibly. (4) Additional verification layers and advanced tools. (5) Taking pragmatic risks to move faster. (6) Deciding which approach to take. (7) Other things to incorporate into the deployment process.

featured in #320

The Scoop: Inside the Longest Atlassian Outage of All Time

tl;dr: Gergely covers a timeline of events, cause of the outage, what customers are saying, the impact of the outage on Atlassian’s business, learnings from this outage, and more.

featured in #308

The Scoop: Inside Fast’s Rapid Collapse

tl;dr: "I am covering details from the vantage point of software engineers and engineering managers." Gergely covers how Fast able to hire engineers competing with the big tech companies, warning signs within the company as seen from an engineering perspective, the current situation within the company, and more.

featured in #307

Migrations Done Well

tl;dr: "If you do some groundwork before starting the migration, you’ll reduce risk, gain confidence and understand the scope of the migration better." Gergely breaks the migration process into the following steps: (1) Preparation for migrations. (2) Pre-migration steps, such as monitoring and validation. (3) The migration itself, covering downtime, strategies & toolset. (4) After the migration. (5) The migration long-tail.

featured in #302

/Gergely Orosz