/SQL

Is 20M Of Rows Still A Valid Soft Limit Of MySQL Table In 2023?

- Yisheng Gong tl;dr: “There’s rumor around the internet that we should avoid having > 20M rows in a single MySQL table. Otherwise, the table’s performance will be downgraded, you will find SQL query much slower than usual when it’s above the soft limit. These judgements were made on HDD many years ago. I’m wondering if it’s still true for MySQL on SSD in 2023, and if true, why is that?”

featured in #417


Online Gradient Descent Written In SQL

- Max Halford tl;dr: Max implements a ML algorithm within a relational database, using SQL. Some databases allow doing inference with an already trained model. Training the model in the database would remove altogether the need for a separate inference / training service. Max attempts to do this with the Online Gradient Descent algorithm.

featured in #398


SQL Should Be Your Default Choice For Data Engineering Pipelines

- Robin Linacre tl;dr: "SQL should be the first option considered for new data engineering work. It’s robust, fast, future-proof and testable. With a bit of care, it’s clear and readable. A new SQL engine - DuckDB - makes SQL competitive with other high performance dataframe libraries, making SQL a good candidate for data of all sizes."

featured in #387


Writing A Python SQL Engine From Scratch

- Toby Mao tl;dr: "This post will cover why I went through the effort of creating a Python SQL engine and how a simple query goes from a string to actually transforming data." Toby covers tokenizing, parsing, optimizing, planning and executing.

featured in #381


Postgres Auditing In 150 lines Of SQL

- Oliver Rice tl;dr: "Data auditing is a system that tracks changes to tables' contents over time. PostgreSQL has a robust set of features which we can leverage to create a generic auditing solution in 150 lines of SQL."

featured in #303


Working With SQL Server Identity Columns

- Greg Larsen tl;dr: "SQL Server identity columns are easy to add to a table, but you must understand how they work to use them effectively. In this article, Greg Larsen explains the nuances of SQL Server identity columns."

featured in #260


A Future For SQL On The Web

- James Long tl;dr: James developed absurd-sql - "a persistent backend for SQLite on the web." It doesn’t load the whole DB into memory, and writes persist. He explains "the absurdities of the web’s storage APIs (mainly IndexedDB), show how SQLite provides a 10x perf improvement, explain all the cool tricks that make it work, and explain the locking / transactional semantics that make it robust."

featured in #245


Against SQL

- Jamie Brandon tl;dr: Jamie believes there is "a huge amount of value to be unlocked by replacing SQL, and more generally in rethinking where and how we draw the lines between databases, query languages and programming languages.

featured in #236


Some Opinionated Thoughts On SQL Databases

- Nelson Elhage tl;dr: "SQL as an API is in many ways a relic from another era, and while it’s held up remarkably well, it also feels like it shows its age... it feels like SQL engines barely even try to offer predictability performance or to build reliable guard rails against accidentally taking the entire site down."

featured in #228


API Pagination Design

- Alexander Solovyov tl;dr: The common way to paginate in APIs - offset or page number - tends to lead to a "suboptimal" SQL query, requiring databases to find, count and throw away rows. Alexander shows us a better way, using cursor.

featured in #211