/Architecture

Scaling To Count Billions

tl;dr: Canva pays creators based on billions of content usages each month. This usage data not only includes templates but also images, videos, and so on. Building and maintaining a service to track this data for payment is challenging and must be accurate, scalable and operable. This post introduces the various architectures the team experimented with and the lessons learned along the way.

featured in #508


Overcoming Event-Driven Architecture Complexity With An Event Gateway

- James Higginbotham tl;dr: EDA offers flexibility and scalability, but as your architecture grows, complexities arise. Message receivers struggle with filtering, third-party orchestration consumes developer time, and webhook integration becomes challenging. James explores how event gateways can address common scenarios that increase EDA complexity.

featured in #506


How Disney+ Scaled To 11 Million Users On Launch Day

- Neo Kim tl;dr: Disney+ scaled to 11M users on launch by running infrastructure in multiple regions for high availability and low latency, using CDN for caching, Kinesis for data streaming, DynamoDB for storing video timestamps and watchlists, a document store for the movie catalog, and machine learning for recommendations. They pre-partitioned and autoscaled DynamoDB to handle growing traffic. Neo discusses the architecture. 

featured in #506


How LedgerStore Supports Trillions Of Indexes At Uber

- Kaushik Devarajaiah tl;dr: “LedgerStore is an immutable storage solution at Uber that provides verifiable data completeness and correctness guarantees to ensure data integrity for these transactions... This blog covers the significance of LedgerStore indexing and its architecture, which powers trillions of indexes, with a petabyte-scale index storage footprint.”

featured in #503


Reduce, Reuse, Recycle: McDonald’s Reusable Workflows

tl;dr: McDonald's engineering teams have created a fast, reliable CI process using reusable workflows and GitHub Actions. Key steps: (1) Grouped CI workflows by language and centralized in reusable workflows to reduce duplication and ensure standards (2) Created a "golden path" with required CI stages like code quality, security, packaging. (3) Allow devs flexibility to add custom stages without impacting others. (4) Use CI visibility tools to monitor workflow metrics like pipeline count, lead times, success/failure rates.

featured in #501


Architecture Of An Early Stage SAAS

- Giuseppe La Torre tl;dr: In this article I describe a simple architecture for an early stage SAAS. As a solo founder, I report some choices made to launch Feelback, a small-scale SAAS for collecting users signals about any content. Some questions you will find answers to: How to design a low-maintenance architecture? Which hosting and providers to choose and what configurations to use? How to deploy to production with ease? How to manage a monorepo with all service systems and components?

featured in #499


How Figma’s Databases Team Lived To Tell The Scale

- Sammy Steele tl;dr: “The data revealed that some of our tables, containing several terabytes and billions of rows, were becoming too large for a single database. At this size, we began to see reliability impact during Postgres vacuums, which are essential background operations that keep Postgres from running out of transaction IDs and breaking down. Our highest write tables were growing so quickly that we would soon exceed the maximum IO operations per second supported by Amazon’s Relational Database Service. Vertical partitioning couldn’t save us here because the smallest unit of partitioning is a single table. To keep our databases from toppling, we needed a bigger lever.”

featured in #498


Behind The Draw - How Canva's Drawing Tool Works

- Alex Gemberg tl;dr: An exploration into the evolution of Canva's drawing tool, highlighting technical challenges to improve application performance and user satisfaction. Alex discusses efforts in optimizing SVG paths, implementing state machines, and introducing native implementations for mobile platforms. 

featured in #497


How LinkedIn Serves 5 Million User Profiles Per Second

tl;dr: The author covers: (1) Why LinkedIn switched to Espresso, a document-oriented database. (2) Scalability issues with Espresso. (3) Introduction to Couchbase. (4) How LinkedIn incorporated Couchbase as a caching layer. (5) Caching layer design principles. 

featured in #494


How Zapier Automates Billions Of Tasks

- Neo Kim tl;dr: Neo takes a look at Zapier's architecture, highlighting its use of Nginx, Python Django, MySQL, Redis, AWS Lambda, RabbitMQ, and Celery for automating billions of tasks. It details Zapier's tech stack, asynchronous processing, scalability strategies, and how they handle task execution and history tracking, using technologies like GraphQL, Next.js, AWS S3, Kafka, and Elasticsearch for efficiency and scalability. 

featured in #493