Classifying All Of The Pdfs On The Internet
- Santiago Pedroza tl;dr: “I classified the entirety of SafeDocs using a mixture of LLMs, Embeddings Models, XGBoost and just for fun some LinearRegressors. In the process I too created some really pretty graphs!”featured in #545
featured in #515
Building A Weather Data Warehouse Part I: Loading A Trillion Rows Of Weather Data Into TimescaleDB
- Ali Ramadhan tl;dr: “I think it would be cool to have historical weather data from around the world to analyze for signals of climate change we’ve already had rather than think about potential future change.” Ali discusses the implementation of this analysis tool.featured in #510
Personalizing The DoorDash Retail Store Page Experience
tl;dr: "In this post, we show how we built a personalized shopping experience for our new business vertical stores, which include grocery, convenience, pets, and alcohol, among many others. Following a high-level overview of our recommendation framework, we home in on the modeling details, the challenges we have encountered along the way, and how we addressed those challenges."featured in #479
featured in #474
Navigating The Chaos: Why You Don’t Need Another MLOps Tool
tl;dr: AI/ML development lacks systematic processes, leading to errors and biases in deployed models. The MLOps landscape is fragmented, and teams need to glue together a ton of bespoke and third-party tools to meet basic needs. We don’t think you should, so we're building Openlayer to condense and simplify AI evaluation.featured in #469
featured in #464
featured in #454
Is This A Date? Using ML To Identify Date Formats In File Names
tl;dr: “To make it easier for our users to organize and find their files, Dropbox has an automated feature called naming conventions. With this feature, users can set rules around how files should be named, and files uploaded to a specific folder will automatically be renamed to match the preferred convention. For example, files could be renamed to include a keyword or date… We developed a machine learning model that can accurately identify dates in a file name so that files can be renamed more effectively.”featured in #452
How DoorDash Improves Holiday Predictions Via Cascade ML Approach
- Chad Akkoyun Zainab Danish tl;dr: DoorDash's engineering team tackled the challenge of accurately forecasting supply and demand during holidays, where traditional tree-based machine learning models like Random Forest and Gradient Boosting faced limitations. The article introduces the "cascade modeling approach" as a solution. This method extends the Gradient Boosting Machine model with a linear model to account for holiday impacts, enhancing forecast accuracy. The cascade approach involves calculating holiday multipliers, preprocessing data, and post-processing forecasts.featured in #446