Essential Reading For Engineering Leaders

Serving A Billion Web Requests With Boring Code

- Bill Mill

Scale

tl;dr: “I worked on this system for about two and a half years, from the very first commit through two open enrollment periods. The API system served about 5 million requests on a normal weekday, with < 10 millisecond average request latency and a 95th percentile latency of less than 100 milliseconds.”

featured in #585

Database Sharding Explained

- Mahdi Yusuf

Database
Scale

tl;dr: Mahdi discusses when to use it, how it can be set up, why we shard data stores and various options you have before sharding.

featured in #584

How Discord Reduced Websocket Traffic by 40%

- Austin Whyte

Socket
Scale

tl;dr: “zstandard has gained enough traction to become a viable replacement for zlib. Zstandard offers higher compression ratios and shorter compression times and supports dictionaries: a way to preemptively exchange information about compressed content, further increasing compression ratios and reducing the overall bandwidth usage.”

featured in #552

The Sneaky Costs Of Scaling Serverless

- Zach Leatherman

Scale
Serverless

tl;dr: “I decided to take the plunge and migrate my site elsewhere, mostly to see what it would really cost. I learned a few things along the way (and made a few mistakes) — hopefully writing them up can help you save some money on your hosting bill, too.”

featured in #541

Serving A Billion Web Requests With Boring Code

- Bill Mill

Scale

tl;dr: “I worked on this system for about two and a half years, from the very first commit through two open enrollment periods. The API system served about 5 million requests on a normal weekday, with < 10 millisecond average request latency and a 95th percentile latency of less than 100 milliseconds.”

featured in #528

Personalized Marketing at Scale: Uber’s Out-of-App Recommendation System

Scale
Uber

tl;dr: "Out-of-app (OOA) communication (such as email, push, and SMS) is an important growth lever at Uber. It allows marketers, product owners, and operation teams to connect with users on a plethora of topics, including user promotions, new and favorite restaurants, etc. Building a system to personalize these communications presents unique and exciting challenges. In this blog post, we walk through these challenges and our journey in tackling them."

featured in #527

How Meta Trains Large Language Models At Scale

Architecture
Scale

tl;dr: “Our AI model training has involved a training massive number of models that required a comparatively smaller number of GPUs. This was the case for our recommendation models that would ingest vast amounts of information to make accurate recommendations that power most of our products. With the advent of generative AI, we’ve seen a shift towards fewer jobs, but incredibly large ones. Supporting GenAI at scale has meant rethinking how our software, hardware, and network infrastructure come together.”

featured in #525

How Zapier Automates Billions Of Tasks

- Neo Kim

Scale
Architecture

tl;dr: Neo takes a look at Zapier's architecture, highlighting its use of Nginx, Python Django, MySQL, Redis, AWS Lambda, RabbitMQ, and Celery for automating billions of tasks. It details Zapier's tech stack, asynchronous processing, scalability strategies, and how they handle task execution and history tracking, using technologies like GraphQL, Next.js, AWS S3, Kafka, and Elasticsearch for efficiency and scalability.

featured in #493

1.5+ Million PDFs In 25 minutes

- Sarat Chandra Karan Sharma

PDF
Scale

tl;dr: “In this blog post, we describe our journey of building an architecture from scratch which now enables us to process, generate, digitally sign, and e-mail out 1.5+ million PDF contract notes in about 25 minutes, incurring only negligible costs. We self-host all elements of this architecture relying on raw EC2 instances for compute and S3 for ephemeral storage. In addition, the concepts used for orchestration of this particular workflow can now be used for orchestrating many different kinds of distributed jobs within our infrastructure.”

featured in #492

Scaling ChatGPT: Five Real-World Engineering Challenges

- Gergely Orosz Evan Morikawa

Scale
OpenAI

tl;dr: An interview with Evan Morikawa, who led the OpenAI Applied Engineering team as ChatGPT launched and scaled. Evan reveals the five engineering challenges along with lessons learned. Challenges are: (1) KV Cache & GPU RAM. (2) Optimizing batch size. (3) Finding the right metrics to measure. (4) Finding GPUs wherever they are. (5) Inability to autoscale.

featured in #491

/Scale