/SQL

Leadership Power Tools: SQL And Statistics

- Matt Blewitt tl;dr: “A common pattern I’ve seen over the years have been folks in engineering leadership positions that are not super comfortable with extracting and interpreting data from stores, be it databases, CSV files in an object store, or even just a spreadsheet. We’re going to cover SQL & DuckDB, then some useful statistical tools: summary stats, distributions, confidence intervals and Bayesian reasoning.”

featured in #578


SQL Tips And Tricks

- Ben Nour tl;dr: “A somewhat opinionated list of SQL tips and tricks that I've picked up over the years in my job as a data analyst. Please note that some of these tips might not be relevant for all RDBMs.”

featured in #568


SQL Tips And Tricks

- Ben Nour tl;dr: “A somewhat opinionated list of SQL tips and tricks that I've picked up over the years in my job as a data analyst. Please note that some of these tips might not be relevant for all RDBMs.”

featured in #554


SQL Tips And Tricks

- Ben Nour tl;dr: “A somewhat opinionated list of SQL tips and tricks that I've picked up over the years in my job as a data analyst. Please note that some of these tips might not be relevant for all RDBMs.”

featured in #553


Sampling With SQL

- Tom Moertel tl;dr: “In this post, we’ll look at some clever algorithms for taking samples. These algorithms are fast and easily translated into SQL.”

featured in #546


How SQL Query Works? SQL Query Execution Order For Tech Interview

tl;dr: “While SQL queries are written in a declarative, human-readable format, there is a complex process that occurs behind the scenes to execute these queries and retrieve the desired results. In this article, we'll delve into the inner workings of SQL queries, breaking down the process step by step.”

featured in #529


How We Built Text-to-SQL At Pinterest

tl;dr: “We took the rise in availability of LLMs as an opportunity to explore whether we could assist our data users with this task by developing a Text-to-SQL feature which transforms these analytical questions directly into code.” The authors describe the tools evolution and implementation. 

featured in #507


GPT In 500 Lines Of SQL

tl;dr: "Before a text can be fed to a neural network, it needs to be converted into a list of numbers. GPT2 uses a variation of the algorithm called Byte pair encoding to do precisely that. Its tokenizer uses a dictionary of 50257 code points - in AI parlance, 'tokens' - that correspond to different byte sequences in UTF-8, plus the 'end of text' as a separate token. This dictionary was built by statistical analysis performed like this: Start with a simple encoding of 256 tokens: one token per byte. Perform the collapse 50000 times over."

featured in #478


SQL As API

- Valentin Willscher tl;dr: "I know what you are thinking: Exposing an API that accepts SQL is crazy. It's a terrible idea. Especially if the API is exposed on the internet. Doing that is insecure and will lead to SQL injection attacks, it is a nightmare to maintain and it will lock the backend implementation into a specific technology (some ANSI SQL database). But is that really true? Time to re-evaluate!"

featured in #476


The Case Of A Curious SQL Query

- Justin Jaffray tl;dr: Justin provides a deep dive into SQL's foundational aspects, highlighting the importance of a formalized approach to query behavior. Using the example of "predicate pushdown," Justin presents a SQL query that behaves differently across databases like DuckDB, SQLite, and CockroachDB. "I think it's a fun little mind bender that gives you some insight into the internals of these databases query engines without having to actually look at any code."

featured in #460