SMURF: Beyond The Test Pyramid
- Adam Bender tl;dr: “The test pyramid is the canonical heuristic for guiding test suite evolution. It conveys a simple message - prefer more unit tests than integration tests, and prefer more integration tests than end-to-end tests. While useful, the test pyramid lacks the details you need as your test suite grows and you face challenging trade-offs. To scale your test suite, go beyond the test pyramid. The SMURF mnemonic is an easy way to remember the tradeoffs to consider when balancing your test suite.”featured in #559
featured in #550
Shifting E2E Testing Left At Uber
- Quess Liu Daniel Tsui tl;dr: “In this blog, we describe how we built a system that gates every code and configuration change to our core backend systems (1,000+ services). We have several thousand E2E tests that have an average pass rate of 90%+ per attempt. Imagine each of these tests going through a real E2E user flow, like going through an Uber Eats group order. We do all this fast enough to run on every diff before it gets landed.”featured in #544
6 Hard Lessons We Learned About Automated Testing For GenAI Apps
- John Gluck tl;dr: Testing LLMs is not simple. The probabilistic output makes failures hard to identify while running the models repeatedly tends to become very expensive quickly. In this blog post, QA Wolf engineer John Gluck covers 6 things the team learned about building automated black-box regression tests for genAI applications.featured in #531
Drata Secured 86% Faster QA Cycles
tl;dr: QA Wolf is delivering QA at DrataSpeed: (1) Regression testing is 90 minutes faster than before, and includes 4x more test cases. (2) Quickly onboarded and gave Drata’s QA resources space to work on new features, saving more than $500,000/year. (3) Went from overnight deploys to multiple times daily.featured in #530
6 Hard Lessons We Learned About Automated Testing For GenAI Apps
- John Gluck tl;dr: Testing LLMs is not simple. The probabilistic output makes failures hard to identify while running the models repeatedly tends to become very expensive quickly. In this blog post, QA Wolf engineer John Gluck covers 6 things the team learned about building automated black-box regression tests for genAI applications.featured in #529
Autotrader Saved $620K/YR Trading In Manual Testing For Automation
tl;dr: Automated testing with cruise control allowed: (1) Offset the need to hire six QA engineers, saving $600K+/year. (2) Returned more than 1,000 hours per year to the customer support team, saving $20,000/year. (3) Increased release velocity 15–20%. (4) Reduced QA cycles from 3+ days to 15 minutes.featured in #528
Getting 100% Code Coverage Doesn't Eliminate Bugs
- Kostis Kapelonis tl;dr: “There are many articles already on the net explaining why this is a fallacy, but I recently discovered that sharing an actual code example goes a long way towards proving why 100% code coverage doesn’t mean zero bugs. These people have their “aha” moment when they look at real code, instead of recycling theoretical arguments over and over.”featured in #527
Debugging With Production Neighbors
tl;dr: SLATE is Uber’s E2E testing tool for microservice architectures that allows testing of services alongside production dependencies. It enables developers to generate test requests mimicking production flows while targeting services under test. This blog explores three debugging options in SLATE: remote debugging of deployed instances, local debugging on developer machines, and debugging through filtered monitoring. These features aim to simplify troubleshooting in production-like environments.featured in #525
featured in #521