/ML

In Search Of The Least Viewed Article On Wikipedia

- Colin Morris tl;dr: "Based on our findings above, the least viewed articles on Wikipedia are not going to be merely about topics with little popular interest - they must also be “unlucky” in the sense of having very small random gaps... Of these 600,000 least lucky articles, all received at least a few views in 2021. The booby prize for least popular article of 2021 is shared by two articles which received exactly 3 probably-human pageviews."

featured in #322


Evolution Of ML Fact Store

- Vivek Kaushal tl;dr: "This post will focus on the large volume of high-quality data stored in Axion — our fact store that is leveraged to compute ML features offline. We built Axion primarily to remove any training-serving skew and make offline experimentation faster. We will share how its design has evolved over the years and the lessons learned while building it."

featured in #321


How DALL-E 2 Actually Works

- Ryan O'Connor tl;dr: A URI is a string that identifies a resource. From a syntactical point of view, a URI string mostly follows the same format as the URL. A URN identifies resources in a permanent way, even after that resource does not exist anymore.

featured in #311


Real World Recommendation System - Part 1

- Nikhil Garg tl;dr: "FAANG and other top tech companies have independently converged on a common architecture for production grade recommendation systems." This architecture is domain / vertical agnostic and can power all sorts of applications — from e-commerce and feeds to search, notifications, etc... Nikhil starts from the basics, explains nuances and describes this universal architecture.

featured in #310


On Owning A Software Problem

- Vicki Boykis tl;dr: What is a low-friction small thing that most will not notice, but that when they do, is a sign of craftsmanship, expertise, and pride in one's work? Vicki has created a list relevant for ML and Data Science: (1) Python code has type annotations. (2) Accurate documentation of a repo and an easy, reproducible way to run the project. (3) Formatted and linted SQL statements. And more.

featured in #293


How We Optimized Python API Server Code 100x

- Vadim Markovtsev tl;dr: "Some of the tricks we used to speed up calls to our analytical API written in Python: played with asyncio, messed with SQLAlchemy, hacked deep in asyncpg, rewrote parts in Cython, found better data structures, replaced some pandas with pure numpy."

featured in #291


Red Hot: The 2021 Machine Learning, AI and Data (MAD) Landscape

- Matt Turck tl;dr: Matt covers the macro view: making sense of the ecosystem’s complexity, financings, IPOs and M&A, a landscape of the ecosystem, key trends and more.

featured in #258


Machine Learning Is Going Real-time

- Chip Huyen tl;dr: Chip discusses two approaches: (1) Online predictions, where an ML system makes predictions in real-time. (2) Online learning, where ML system incorporate new data and update models in real-time.

featured in #219


Experimenting With Automatic Video Creation From A Web Page

- Peggy Chi Irfan Essa tl;dr: "we envision a future where creators focus on making high-level decisions and an ML model interactively suggests detailed temporal and graphical edits for a final video creation on multiple platforms."

featured in #215


The Case For A Learned Sorting Algorithm

- Adrian Colyer tl;dr: On a large dataset i.e. 1 billion items, Learned Sort outperforms its competitor by a factor of 1.49x, and that includes time taken to train the model. Adrian explains how it works.

featured in #211