Essential Reading For Engineering Leaders

- Mahdi Yusuf

Database
Scale

tl;dr: Mahdi discusses when to use it, how it can be set up, why we shard data stores and various options you have before sharding.

featured in #401

How Discord Stores Trillions Of Messages

- Bo Ingram

tl;dr: “Our Cassandra cluster exhibited serious performance issues that required increasing amounts of effort to just maintain, not improve.” Bo discusses the troubles with Cassandra and the migration to ScyllaDB, a Cassandra-compatible database written in C++.

featured in #396

From Postgres To Amazon DynamoDB

tl;dr: From the engineering team at Instacart, who have to manage and efficiently store and query hundreds of terabytes of data. The primary datastore of choice was Postgres - but once specific use cases began to outpace the largest Amazon EC2 instance size AWS offers - they chose Amazon DynamoDB. Here they discuss migrating existing tables from Postgres to DynamoDB.

featured in #394

What's Identity-Native Infrastructure Access?

tl;dr: Unlock all Teleport Connect sessions to learn about infrastructure access from DoorDash, Dropbox, Discord, Vonage, and others when you RSVP for the Feb 9th event.

featured in #386

Scaling PostgresML To 1 Million Requests Per Second

- Lev Kokotov

PostgreSQL
Scale

tl;dr: "In this post, we'll discuss how we horizontally scale PostgresML to achieve more than 1 million XGBoost predictions per second on commodity hardware.

featured in #367

Atomic Commitment: The Unscalability Protocol

- Marc Brooker

Database
Scale

tl;dr: Marc describes the classic CS problem Atomic Commitment. "The classic solution to this classic problem is Two-phase commit, maybe the most famous of all distributed protocols. There's a lot we could say about atomic commitment, or even just about two-phase commit. In this post, I'm going to focus on just one aspect: Atomic Commitment has weird scaling behavior."

featured in #360

9 Enablement Practices To Achieve DevOps At Enterprise Scale

tl;dr: Christian Oestreich, a senior software engineering leader with experience at multiple Fortune 500 companies, shares how to adopt a well-planned metrics-driven strategy that yields better quality code and lowers support costs.

featured in #353

Supercharging A/B Testing At Uber

tl;dr: "While the statistical underpinnings of A/B testing are a century old, building a correct and reliable A/B testing platform and culture at a large scale is still a massive challenge... Uber went through a similar journey and this blog post describes why and how we rebuilt the A/B testing platform we had at Uber."

featured in #337

What Happens When You Swipe A Credit Card?

- Alex Xu

SystemDesign
Scale

tl;dr: "Visa, Mastercard, and American Express act as card networks for clearing and settling funds. The card acquiring bank and the card issuing bank can be – and often are – different. If banks were to settle transactions one by one without an intermediary, each bank would have to settle the transactions with all the other banks. This is quite inefficient."

featured in #335

Data Teams Are Getting Larger, Faster

tl;dr: "But something happens when a data team grows past 10 people. You no longer know if the data you use is reliable, the lineage is too large to make sense of and end-users start complaining about data issues every other day." Mikkel discusses how to deal with scaling teams.

featured in #334

/Scale