/Database

The Case Of A Curious SQL Query

- Justin Jaffray tl;dr: Justin provides a deep dive into SQL's foundational aspects, highlighting the importance of a formalized approach to query behavior. Using the example of "predicate pushdown," Justin presents a SQL query that behaves differently across databases like DuckDB, SQLite, and CockroachDB. "I think it's a fun little mind bender that gives you some insight into the internals of these databases query engines without having to actually look at any code."

featured in #460


Storage Challenges In The Evolution Of Database Architecture

- Sujay Venaik tl;dr: “Sync service has been running since 2014, and we started facing issues related to physical storage on the database layer. For context, sync service runs on an AWS RDS Aurora cluster that has a single primary writer node and 3-4 readers, all of which are r6g.8xlarge. AWS RDS has a physical storage size limit of 128TiB for each RDS cluster… We were hovering around ~95TB, and our rate of ingestion was ~2TB per month. At this rate, we realized we would see ingestion issues in another 6-8 months.” The team devised a three-pronged strategy: eliminating unused tables, revising their append-only tables approach, and methodically freeing up space from sizable tables. This strategy successfully reclaimed about 60TB of space.

featured in #452


Upsert In SQL

- Anton Zhiyanov tl;dr: Anton discusses the "Upsert" operation in SQL, which inserts new records and updates existing ones. The author provides interactive examples and demonstrates how different Database Management Systems handle upserts, including MySQL, SQLite and PostgreSQL.

featured in #452


Navigating The Stars: How InfluxDB Powers Loft Orbital's Space Innovations

tl;dr: Loft Orbital, a leading space infrastructure service provider, simplifies space missions with technological advances. They operate customer payloads on microsatellites as a service. Using Telegraf, InfluxDB, and Google Cloud, they collect telemetry data from spacecraft and monitor mission infrastructure. InfluxDB aids in QA, performance monitoring, and reveals long-term data trends, enhancing their mission automation.

featured in #451


How Do Databases Execute Expressions?

- Phil Eaton tl;dr: “Databases are fun. They sit at the confluence of Computer Science topics that might otherwise not seem practical in life as a developer. For example, every database with a query language is also a programming language implementation of some caliber. That doesn't include all databases though of course; see: RocksDB, FoundationDB, TigerBeetle, etc. This post looks at how various databases execute expressions in their query language.”

featured in #451


Inside New Query Engine Of MongoDB

- Nikita Lapkov tl;dr: A significant overhaul of the Query Execution Engine has been announced. The article provides an in-depth look into the technical aspects of this change. The previous engine, termed "Classic," was built around JSON documents, leading to inefficiencies in complex queries. The new Slot Based Engine (SBE) introduces "slots" as a means to pass data, optimizing the process. Nikita delves into the architecture, data flow, and challenges faced during the transition.

featured in #449


Tumblr Shares Database Migration Strategy With 60+ Billion Rows

tl;dr: The article delves into Tumblr's database migration strategy. With a massive MySQL database spanning 21 terabytes and 60+ billion rows, Tumblr sought a migration approach that minimized user impact. Initially considering a brute force method, they later adopted the CQRS pattern, which separates database read and write operations. To combat latency issues, Tumblr introduced a database proxy in the local data center, which maintained persistent connections to the remote leader and allowed for connection pooling. This strategy ensured minimal user disruption during migration.

featured in #447


Teréga Replaced Its Legacy Data Historian with InfluxDB, AWS, And IO-Base

- Jessica Wachtel tl;dr: Teréga, a French gas company, faced challenges with outdated IT systems. Recognizing a gap in available cloud-native data historians, they turned to InfluxDB. With InfluxDB, they developed Indabox for efficient data collection and IO-Base, hosted on AWS, for robust data storage. This InfluxDB-centric solution significantly modernized Teréga's IT landscape.

featured in #446


This Is How Quora Shards MySQL To Handle 13+ Terabytes

tl;dr: With data storage requirements in the tens of terabytes and 100,000 queries per second, Quora chose MySQL for its improved read performance. To manage rapid data growth and high write queries, Quora implemented both vertical and horizontal sharding techniques. Vertical sharding involves moving different tables to different servers, improving write scalability. Horizontal sharding, on the other hand, splits a large table into multiple smaller tables. Quora opted to build its sharding solution instead of using third-party service for low latency and easy reuse of existing logic.

featured in #445


Fuzz Testing Is the Best Thing To Happen To Our Application Tests

- Andrei Pechkurov tl;dr: The team at QuestDB faced challenges with segfaults, data corruption, and concurrency bugs. To address these, the team implemented fuzz testing, an automated software testing technique that provides invalid or unexpected data to a program to monitor for exceptions. This article details the process of introducing fuzz testing, revealing critical issues and leading to more robust database performance. The team also collaborated with SQLancer, a tool for testing SQL Database Management Systems, to uncover issues in their SQL engine.

featured in #441