Pointer: What Current & Future Engineering Leaders Read

- Maor Kern

SQL

tl;dr: “So you probably wake up every morning asking yourself three of life’s most pertinent questions- how do I build a streaming SQL engine, what even is a streaming SQL engine, and can our Lord drop tables owned by another user. I too found myself asking these questions, sometimes even dreaming about them- often in the form of various SQL operators pointing and laughing at my incompetence as I beg them to answer me.” A streaming SQL engine keeps queries’ results up to date without ever having to recalculate them and Maor discusses topic like data ingestion, query optimization, and stream processing.

featured in #458

SQL Join Flavors

- Anton Zhiyanov

SQL

tl;dr: Anthon provides a look at SQL joins, covering types like "Qualified Join," "Natural Join," "Cross Join," "Partitioned Join," and "Lateral Join." He explains that a "Qualified join" is an umbrella term for the most common types of joins: inner, left, right, and full. "A qualified join connects records from two datasets into one, according to the matching criteria you specify," he states. The article also warns against the use of "Natural Joins," stating they are "almost always a bad idea."

featured in #450

Is 20M Of Rows Still A Valid Soft Limit Of MySQL Table In 2023?

- Yisheng Gong

SQL
Database

tl;dr: “There’s rumor around the internet that we should avoid having > 20M rows in a single MySQL table. Otherwise, the table’s performance will be downgraded, you will find SQL query much slower than usual when it’s above the soft limit. These judgements were made on HDD many years ago. I’m wondering if it’s still true for MySQL on SSD in 2023, and if true, why is that?”

featured in #417

Online Gradient Descent Written In SQL

- Max Halford

ML
Algo
SQL

tl;dr: Max implements a ML algorithm within a relational database, using SQL. Some databases allow doing inference with an already trained model. Training the model in the database would remove altogether the need for a separate inference / training service. Max attempts to do this with the Online Gradient Descent algorithm.

featured in #398

SQL Should Be Your Default Choice For Data Engineering Pipelines

- Robin Linacre

SQL
Data

tl;dr: "SQL should be the first option considered for new data engineering work. It’s robust, fast, future-proof and testable. With a bit of care, it’s clear and readable. A new SQL engine - DuckDB - makes SQL competitive with other high performance dataframe libraries, making SQL a good candidate for data of all sizes."

featured in #387

Writing A Python SQL Engine From Scratch

- Toby Mao

Python
SQL

tl;dr: "This post will cover why I went through the effort of creating a Python SQL engine and how a simple query goes from a string to actually transforming data." Toby covers tokenizing, parsing, optimizing, planning and executing.

featured in #381

Postgres Auditing In 150 lines Of SQL

- Oliver Rice

PostgreSQL
SQL

tl;dr: "Data auditing is a system that tracks changes to tables' contents over time. PostgreSQL has a robust set of features which we can leverage to create a generic auditing solution in 150 lines of SQL."

featured in #303

Working With SQL Server Identity Columns

- Greg Larsen

SQL

tl;dr: "SQL Server identity columns are easy to add to a table, but you must understand how they work to use them effectively. In this article, Greg Larsen explains the nuances of SQL Server identity columns."

featured in #260

A Future For SQL On The Web

- James Long

SQL

tl;dr: James developed absurd-sql - "a persistent backend for SQLite on the web." It doesn’t load the whole DB into memory, and writes persist. He explains "the absurdities of the web’s storage APIs (mainly IndexedDB), show how SQLite provides a 10x perf improvement, explain all the cool tricks that make it work, and explain the locking / transactional semantics that make it robust."

featured in #245

Against SQL

- Jamie Brandon

SQL

tl;dr: Jamie believes there is "a huge amount of value to be unlocked by replacing SQL, and more generally in rethinking where and how we draw the lines between databases, query languages and programming languages.

featured in #236

/SQL