Essential Reading For Engineering Leaders

Improving Instagram Notification Management With Machine Learning And Causal Inference

- Nailong Zhang

ML

tl;dr: "The key to solving this problem is figuring out the incremental value of sending a daily digest notification compared to not sending... For some cohorts, they would be active without receiving the daily digest notifications and thus the incremental values would be small; selecting these cohorts to send the digest notifications is inefficient and may even spam these users."

featured in #366

RecSysOps: Best Practices for Operating a Large-Scale Recommender System

- Ehsan Saberian Justin Basilico

BestPractices
ML

tl;dr: "In this blog post, we introduce RecSysOps a set of best practices and lessons that we learned while operating large-scale recommendation systems at Netflix. These practices helped us to keep our system healthy while: (1) reducing our firefighting time, (2) focusing on innovations and (3) building trust with our stakeholders."

featured in #360

What I Learned Building Platforms At Stitch Fix

Platform
ML

tl;dr: "I was lucky enough to spend the last six years focusing on “engineering for data science” and learning to build great platforms." Stefan guides us through 5 lessons he learned: (1) Focus on adoption, not completeness. (2) Your users are not all equal. (3) Abstract away the internals of your system. (4) Live your users’ life cycle. (5) The two layer API trick.

featured in #359

Machine Learning For Fraud Detection in Streaming Services

ML

tl;dr: "Many users across many platforms make for a uniquely large attack surface that includes content fraud, account fraud, and abuse of terms of service. Detection of fraud and abuse at scale and in real-time is highly challenging."

featured in #355

How The New York Times Uses Machine Learning To Make Its Paywall Smarter

- Rohit Supekar

ML

tl;dr: "When the paywall was launched, the meter limit was the same for all users. However, as The Times has transformed into a data-driven digital company, we are now successfully using a causal machine learning model called the Dynamic Meter to set personalized meter limits and to make the paywall smarter."

featured in #345

Introducing Natural Language Search For Podcast Episodes

- Alexandre Tamborrino

ML
Spotify

tl;dr: "To enable users to find more relevant content with less effort, we started investigating a technique called Natural Language Search, also known as Semantic Search. In a nutshell, Natural Language Search matches a query and a textual document that are semantically correlated instead of needing exact word matches. It matches synonyms, paraphrases, etc., and any variation of natural language that express the same meaning."

featured in #336

The Berkeley Crossword Solver

ML
AI

tl;dr: "The BCS uses a two-step process to solve crossword puzzles. First, it generates a probability distribution over possible answers to each clue using a question answering (QA) model; second, it uses probabilistic inference, combined with local search and a generative language model, to handle conflicts between proposed intersecting answers."

featured in #331

In Search Of The Least Viewed Article On Wikipedia

- Colin Morris

ML

tl;dr: "Based on our findings above, the least viewed articles on Wikipedia are not going to be merely about topics with little popular interest - they must also be “unlucky” in the sense of having very small random gaps... Of these 600,000 least lucky articles, all received at least a few views in 2021. The booby prize for least popular article of 2021 is shared by two articles which received exactly 3 probably-human pageviews."

featured in #322

Evolution Of ML Fact Store

- Vivek Kaushal

ML
Netflix

tl;dr: "This post will focus on the large volume of high-quality data stored in Axion — our fact store that is leveraged to compute ML features offline. We built Axion primarily to remove any training-serving skew and make offline experimentation faster. We will share how its design has evolved over the years and the lessons learned while building it."

featured in #321

How DALL-E 2 Actually Works

- Ryan O'Connor

AI
ML

tl;dr: A URI is a string that identifies a resource. From a syntactical point of view, a URI string mostly follows the same format as the URL. A URN identifies resources in a permanent way, even after that resource does not exist anymore.

featured in #311

/ML