/Architecture

Streamlining Financial Precision: Uber’s Advanced Settlement Accounting System

tl;dr: “We process about 1.2 billion settlements each month, handling around $130 billion in cash in transit annually from over 50 different PSPs. This process ensures that ‌funds received from various payment methods are accurately accounted for and matched with the corresponding bank statements. Settlement accounting at Uber involves offsetting receivables booked during revenue accounting, verifying the cash deposited in the bank, and analyzing contracts with PSPs to determine fees, taxes, and other charges associated with transactions.”

featured in #562


How Vercel Adopted Microfrontends

tl;dr: “By rethinking our architecture, we shifted to vertical microfrontends, leading to a simpler development experience and over a 40% improvement in preview build times and local development compilation. Streamlined dependencies by removing code for the other microfrontends also reduced page weight and boosted end-user performance, with gains in Core Web Vitals like Largest Contentful Paint (LCP) and Interaction to Next Paint (INP).”

featured in #561


Introducing Netflix’s TimeSeries Data Abstraction Layer

tl;dr: Netflix developed the TimeSeries Abstraction — a versatile and scalable solution designed to efficiently store and query large volumes of temporal event data with low millisecond latencies, all in a cost-effective manner across various use cases. “In this post, we will delve into the architecture, design principles, and real-world applications of the TimeSeries Abstraction, demonstrating how it enhances our platform’s ability to manage temporal data at scale.”

featured in #559


Genie: Uber’s Gen AI On-Call Copilot

tl;dr: “For building an on-call copilot, we chose between fine-tuning an LLM model or leveraging Retrieval-Augmented Generation (RAG). Fine-tuning requires curated data with high-quality, diverse examples for the LLM to learn from. It also requires compute resources to keep the model updated with new examples.”

featured in #558


Making Uber’s ExperimentEvaluation Engine 100x Faster

tl;dr: “How we made efficiency improvements to Uber’s Experimentation platform to reduce the latencies of experiment evaluations by a factor of 100x (milliseconds to microseconds). We accomplished this by going from a remote evaluation architecture (client to server RPC requests) to a local evaluation architecture (client-side computation). Some of the terminology in this blog post (e.g., parameters, experiments, etc.) is referenced from our previous blog post on Uber Experimentation. To learn more, check out Supercharging A/B Testing at Uber.”

featured in #556


Introducing Netflix’s Key-Value Data Abstraction Layer

tl;dr: “In this post, we dive deep into how Netflix’s KV abstraction works, the architectural principles guiding its design, the challenges we faced in scaling diverse use cases, and the technical innovations that have allowed us to achieve the performance and reliability required by Netflix’s global operations.”

featured in #552


Should We Decompose Our Monolith?

- Will Larson tl;dr: “Even as popular sentiment has generally turned away from microservices, many engineering organizations have a bit of both, often the reminents of one or more earlier but incomplete migration efforts. This strategy looks at a theoretical organization stuck with a bit of both approaches, let’s call it Theoretical Compliance Company, which is looking to determine its path forward.”

featured in #550


Real-Time Mouse Pointers

- Anton Egorov Mark Gurevich tl;dr: “Websockets and WebRTC technologies are both excellent options for real-time presence features. Using the combination of WebSockets and a message broker or an in-memory database like Redis, it’s possible to implement such features on a global website with hundreds of thousands of simultaneous collaborative users. This approach enables users to interact seamlessly and synchronously, fostering a sense of connection and community in a digital space.”

featured in #542


Meet Chrono, Our Scalable, Consistent, Metadata Caching Solution

tl;dr: From the team at Dropbox, “If we wanted to solve our high-volume read QPS problem while upholding our clients’ expectation of read consistency, traditional caching solutions would not work. We needed to find a scalable, consistent caching solution to solve both problems at once. This article discusses Chrono, a scalable, consistent caching system built on top of Dropbox’s key-value storage system.“

featured in #536


Odin: Uber’s Stateful Platform

- Jesper Borlum Gianluca Mezzetti tl;dr: “The Odin platform aims to provide a unified operational experience by encompassing all aspects of managing stateful workloads. These aspects include host lifecycle, workload scheduling, cluster management, monitoring, state propagation, operational user interfaces, alerting, auto-scaling, and automation. Uber deploys stateful systems at global, regional, and zonal levels, and Odin is designed to manage these systems consistently and in a technology-agnostic manner.” This post provides an overview of Odin’s origins, the fundamental principles, and the challenges encountered early on. 

featured in #534