Essential Reading For Engineering Leaders

Using LLM To Transcribe Restaurant Menu Photos

- Zhe Mai Zheng Hu Ying Yang

LLM
ML

tl;dr: Previously, the team at Doordash relied on humans to transcribe and update restaurant menus manually, which is costly and time-consuming. The rapid improvement of large language models, or LLMs, creates an opportunity for a big stepwise change, allowing AI to transcribe information from menu photos. However the diverse menu structures restaurants use pose a challenge for an LLM to do an accurate job at scale. In this blog, we will discuss how we built a system with a guardrail layer for LLMs leveraging traditional ML techniques.

featured in #604

Foundation Model For Personalized Recommendation

ML

tl;dr: Netflix’s personalized recommender system is a complex system, boasting a variety of specialized machine learned models each catering to distinct needs including “Continue Watching” and “Today’s Top Picks for You.” However, maintenance of the recommender system became quite costly and it was difficult to transfer innovations from one model to another. This scenario underscored the need for a new recommender system architecture where member preference learning is centralized, enhancing accessibility and utility across different models.

featured in #602

Behind The Scenes Of Canva's DesignDNA Campaign

- Divya Patel

ML

tl;dr: “We challenged ourselves to use this opportunity to showcase Canva's AI capabilities. We wanted to leverage generative AI to create a personalized and engaging experience that users could easily share on social media and spark a sense of accomplishment and connection with our brand. This blog post delves into the campaign design process and highlights how we used generative AI to deliver a personalized and engaging experience to millions of Canva users.”

featured in #593

How DoorDash Leveraged Its Product Knowledge Graph To Enable A High-Velocity Tagging And Badging Experience

- Chuanpin Zhu Irene Chen

ML
Architecture

tl;dr: “DoorDash launched a number of item badges — user interface (UI) components that highlight key product attributes, such as the number of items in stock. Some badges performed well, while some did not. One thing was clear, though — consumers noticed the badges and changed their behaviors based on their perception of the badge’s value proposition. In this blog post, we explore the issues we encountered trying to ship new badges and the resulting architectural changes that we made.”

featured in #592

Image Replacement In Canva Designs Using Reverse Image Search

- Sven Schindler

ML

tl;dr: “Maintaining a high-quality library is key to creating a seamless design experience for our users. As part of the quality process, swapping an image in a template with another image sometimes becomes necessary. For example, if a third-party media library partnership expires, anywhere we've used their content in the library needs to be replaced. As expected, this is a lengthy process involving extensive manual resources. So naturally, the question arises, can we automate solving it?”

featured in #586

How To Improve Search Without Looking At Queries Or Results

Search
ML

tl;dr: “Canva celebrated the milestone of 200M monthly active users (MAUs). Our customers have over 30 billion designs on Canva and create almost 300 new designs every second. With this growth rate, the ability for Canva Community members to effectively search for and find their designs, as well as those shared to them by team members, is becoming an increasingly challenging and essential problem to solve.”

featured in #569

No GPS Required: Our App Can Now Locate Underground Trains

DeepDive
ML

tl;dr: “Thanks to our clever engineering, we can now predict your location in a subway tunnel using your phone’s vibration signature.” This post dives into how.

featured in #568

Classifying All Of The Pdfs On The Internet

- Santiago Pedroza

ML
LLM

tl;dr: “I classified the entirety of SafeDocs using a mixture of LLMs, Embeddings Models, XGBoost and just for fun some LinearRegressors. In the process I too created some really pretty graphs!”

featured in #545

Machine Unlearning In 2024

- Ken Liu

ML

tl;dr: “As our ML models today become larger and their (pre-)training sets grow to inscrutable sizes, people are increasingly interested in the concept of machine unlearning to edit away undesired things like private data, stale knowledge, copyrighted materials, toxic / unsafe content, dangerous capabilities, and misinformation, without retraining models from scratch.” Ken provides us with an introduction.

featured in #515

Building A Weather Data Warehouse Part I: Loading A Trillion Rows Of Weather Data Into TimescaleDB

- Ali Ramadhan

ML
Data

tl;dr: “I think it would be cool to have historical weather data from around the world to analyze for signals of climate change we’ve already had rather than think about potential future change.” Ali discusses the implementation of this analysis tool.

featured in #510

/ML