Essential Reading For Engineering Leaders

Scaling Up The Prime Video Audio / Video Monitoring Service And Reducing Costs By 90%

- Marcin Kolny

Architecture
Scale

tl;dr: “To ensure that customers seamlessly receive content, Prime Video set up a tool to monitor every stream viewed by customers. This tool allows us to automatically identify perceptual quality issues and trigger a process to fix them.” Marcin discusses how the service’s architecture.

featured in #412

How LinkedIn Adopted A GraphQL Architecture For Product Development

- Arun Sethuramalingam

Architecture
Scale

tl;dr: “In this blog post, we will cover how the GraphQL layer is architected for use by our internal engineers to build member and customer facing applications. Specifically, we will dive into some of the architectural choices that are unique to LinkedIn and why we chose each one of them.”

featured in #411

How To Survive Your Project's First 100,000 Lines

- Evan Ovadia

tl;dr: The Vale compiler hit its 100,000th line of code - this article explains how it was kept from collapsing. “Some of these software engineering techniques came from my time at Google, though ironically most came from my work on the Vale compiler and game development so some of these might be surprising to my engineer comrades out there.” Techniques range from determinism, to testing, to type-system techniques, to general architectural best-practices.

featured in #411

How eBay Modernized The Most Important Page On Our Platform

Scale
Architecture

tl;dr: “eBay’s View Item page lives at the center of our e-commerce platform. Our customers load this page over 250 million times each day, and stringent budgets on site speed and availability guarantee the quality of their experience. And yet, this page had its last intentional rewrite ten years ago.”

featured in #410

Tracing Notifications

- Suman Karumuri George Luong

Scale
Architecture

tl;dr: The engineering team at Slack embarked on a project to improve debugging notifications. “Debugging notification issues within our systems was difficult because each system had a different logging pipeline and data format, making it necessary to look at data with different formats and backends. This process required deep technical expertise and took several days to complete.”

featured in #407

Real-time Messaging

- Sameera Thangudu

Scale
Architecture

tl;dr: From the engineering team at Slack, “we’ll describe the architecture that we use to send real-time messages at scale. We’ll take a closer look at the services that send the chat messages and various events to these online users in real time.”

featured in #406

How Lyft Uses Load Testing To Ensure Reliable Service During Peak Events

- Remco Van Bree

Scale
Testing

tl;dr: “We have come to realize that load testing in production is a powerful tool to prepare systems for unexpected bursty traffic and peak events. We’ll explore why Lyft needed a custom performance testing framework that worked in production, how we built a cross-functional solution, and how we’ve continued to improve this testing platform.”

featured in #404

Twitter's Recommendation Algorithm

Scale
ML
Algo

tl;dr: Twitter recommendation algorithm distills roughly 500 million tweets posted daily down to a handful of top tweets that show up on your device’s, specifically for you. This blog is an introduction to how the algorithm works.

featured in #403

Pull The Andon Cord

- Taylor Pearson

tl;dr: The Andon Cord was a rope that hung in Toyota factories that instantly could stop all work on the assembly line, which workers were encouraged to pull when they saw an issue. Once pulled, a manager came down to look the issue but the worker who pulled the rope was the one that came up with the solution. This process had 2 benefits: (1) It made workers feel trusted and part of the company’s output. (2) It dramatically increased quality as workers had a lot of tacit knowledge that managers didn’t.

featured in #401

Automating Safe, Hands-Off Deployments

- Clare Liguori

tl;dr: “In this article, we walk through the steps a code change goes through in a pipeline at Amazon on its way to production. A typical continuous delivery pipeline has four major phases - source, build, test, and production. We’ll dive into the details of what happens in each of these pipeline phases for a typical AWS service, and provide you with an example of how a typical AWS service team might set up one of their pipelines.”

featured in #401

/Scale