/Scale

Storage Challenges In The Evolution Of Database Architecture

- Sujay Venaik tl;dr: “Sync service has been running since 2014, and we started facing issues related to physical storage on the database layer. For context, sync service runs on an AWS RDS Aurora cluster that has a single primary writer node and 3-4 readers, all of which are r6g.8xlarge. AWS RDS has a physical storage size limit of 128TiB for each RDS cluster… We were hovering around ~95TB, and our rate of ingestion was ~2TB per month. At this rate, we realized we would see ingestion issues in another 6-8 months.” The team devised a three-pronged strategy: eliminating unused tables, revising their append-only tables approach, and methodically freeing up space from sizable tables. This strategy successfully reclaimed about 60TB of space.

featured in #452


How Instagram Scaled To 14 million Users With Only 3 Engineers

- Leonardo Creed tl;dr: Instagram scaled from 0 to 14 million users within a year (October 2010 to December 2011) with three engineers. The success was attributed to three guiding principles: simplicity, not reinventing the wheel and using proven technologies. The article provides a detailed walkthrough of the tech stack. Instagram relied on AWS, using EC2 and Ubuntu Linux, with the frontend developed in Objective-C. They utilized Amazon’s Elastic Load Balancer, Django for the backend, PostgreSQL for data storage, and Amazon S3 for photo storage, caching using Redis and Memcached.

featured in #449


Building A ShopifyQL Code Editor

- Trevor Harmon tl;dr: “This approach enabled us to provide ShopifyQL features to CodeMirror while continuing to maintain a grammar that serves both client and server. The custom adapter we created allows us to pass a ShopifyQL query to the language server, adapt the response, and return a Lezer parse tree to CodeMirror, making it possible to provide features like syntax highlighting, code completion, linting, and tooltips. Because our solution utilizes CodeMirror’s internal parse tree, we are able to make better decisions in the code and craft a stronger editing experience. The ShopifyQL code editor helps merchants write ShopifyQL and get access to their data in new and delightful ways.”

featured in #448


Keeping Figma Fast

- Slava Kim Laurel Woods tl;dr: Figma's journey in evolving its performance testing system as the company scaled. Initially, Figma used a single MacBook for all its in-house performance testing. However, as the codebase grew more complex and the team expanded, this approach became unsustainable. The article outlines the challenges Figma faced, such as the need for more granular performance tests and the limitations of running tests on a single piece of hardware. To address these issues, Figma adopted a two-system approach: a cloud-based system for mass testing and a hardware system for more targeted, precise tests. Both systems are connected by the same Continuous Integration system and aim to catch performance regressions early in the development cycle.

featured in #444


8 Reasons Why WhatsApp Was Able To Support 50 Billion Messages A Day With Only 32 Engineers

tl;dr: (1) Single responsibility principle. (2) Tech stack. Erlang provides scale with a tiny footprint. (3) Leveraged robust open source and third party libraries. (4) A huge emphasis was given to cross-cutting concerns to improve quality. (5) Diagonal scaling to keep the costs and operational complexity low. (6) Critical aspects were measured so bottlenecks were identified and eliminated quickly. (7) Load testing was performed to identify single points of failure. (8) Communication paths between engineers were kept short.

featured in #443


Bottleneck: Resilience And Observability

- Punit Lad Carl Nygard tl;dr: The authors delve into the intricacies of resilience and observability in the context of rapidly scaling systems. As systems expand, their complexity can lead to potential failures. Resilience isn't about averting these failures but adeptly managing them. Observability is pivotal for comprehending system behavior, with its three foundational pillars: Metrics, Logs, and Traces. The authors also highlight challenges posed by the vast data volume in observability and the role of automation.

featured in #442


The Perils Of Migrating A Large-Scale Service At Uber

tl;dr: Details of Uber's journey in migrating its invoice generation service, highlighting challenges and lessons learned. The initial service was written in Python and faced scalability issues due to early design choices, accumulated technical debt and a legacy software stack. The new service was developed in Go, chosen for its speed and flexibility. The migration strategy adopted was component-based, focusing on individual system components rather than entire flows. The migration led to a 97% reduction in computing requirements and enhanced self-serve capabilities, reducing engineers' support work from 60% to under 20%.

featured in #442


Optimizing Speed On eBay.com

- Addy Osmani tl;dr: Optimizations include: (1) Search Results Optimization: By sending the first 10 item images along with the header, eBay ensures quicker downloads, reducing the download start time for search result images. (2) Edge Caching for autosuggestion data: suggestions in the search box are cached and served from a CDN, reducing network latency and server processing time. (3) Edge caching for unrecognized homepage users: Content for unrecognized users is cached on eBay's edge network, allowing first-time users to receive content from a nearby server, reducing network latency and server processing time.

featured in #439


In Defense Of Simple Architectures

- Dan Luu tl;dr: Dan discusses the effectiveness of simple architectures in software development, using Wave, a $1.7B company, as an example. Wave's architecture is a Python monolith on top of Postgres, allowing engineers to focus on delivering value to users. The article emphasizes that simple architectures can be created more cheaply and easily than complex ones, even for high-traffic apps. Despite the trend towards complex, microservice-based architectures, Dan argues for the "unreasonable effectiveness" of monoliths, detailing Wave's choices, mistakes, and areas of unavoidable complexity. Simplicity in architecture can lead to success, allowing companies to allocate complexity where it benefits the business.

featured in #439


How We Built The Canva Apps SDK

- Martin Cronjé tl;dr: Martin’s article outlines the development of the Canva Apps SDK, transitioning from a plugin model to a more flexible app-building platform. The process involved building a secure sandboxed environment, creating a new build-and-deploy pipeline, and designing APIs with a focus on simplicity, safety, evolvability, and consistency. Iterative development, continuous feedback, and a balance between alignment and empowerment were key technical strategies in the SDK's creation.

featured in #437