/Tests

SMURF: Beyond The Test Pyramid

- Adam Bender tl;dr: “The test pyramid is the canonical heuristic for guiding test suite evolution. It conveys a simple message - prefer more unit tests than integration tests, and prefer more integration tests than end-to-end tests. While useful, the test pyramid lacks the details you need as your test suite grows and you face challenging trade-offs. To scale your test suite, go beyond the test pyramid. The SMURF mnemonic is an easy way to remember the tradeoffs to consider when balancing your test suite.”

featured in #559


How To Test

- Alex Kladov tl;dr: “This post describes my current approach to testing. When I started programming professionally, I knew how to write good code, but good tests remained a mystery for a long time. This is not due to the lack of advice — on the contrary, there’s abundance of information & terminology about testing.”

featured in #550


Shifting E2E Testing Left At Uber

- Quess Liu Daniel Tsui tl;dr: “In this blog, we describe how we built a system that gates every code and configuration change to our core backend systems (1,000+ services). We have several thousand E2E tests that have an average pass rate of 90%+ per attempt. Imagine each of these tests going through a real E2E user flow, like going through an Uber Eats group order. We do all this fast enough to run on every diff before it gets landed.”

featured in #544


6 Hard Lessons We Learned About Automated Testing For GenAI Apps

- John Gluck tl;dr: Testing LLMs is not simple. The probabilistic output makes failures hard to identify while running the models repeatedly tends to become very expensive quickly. In this blog post, QA Wolf engineer John Gluck covers 6 things the team learned about building automated black-box regression tests for genAI applications.

featured in #531


Drata Secured 86% Faster QA Cycles

tl;dr: QA Wolf is delivering QA at DrataSpeed: (1) Regression testing is 90 minutes faster than before, and includes 4x more test cases. (2) Quickly onboarded and gave Drata’s QA resources space to work on new features, saving more than $500,000/year. (3) Went from overnight deploys to multiple times daily.

featured in #530


6 Hard Lessons We Learned About Automated Testing For GenAI Apps

- John Gluck tl;dr: Testing LLMs is not simple. The probabilistic output makes failures hard to identify while running the models repeatedly tends to become very expensive quickly. In this blog post, QA Wolf engineer John Gluck covers 6 things the team learned about building automated black-box regression tests for genAI applications.

featured in #529


Autotrader Saved $620K/YR Trading In Manual Testing For Automation

tl;dr: Automated testing with cruise control allowed: (1) Offset the need to hire six QA engineers, saving $600K+/year. (2) Returned more than 1,000 hours per year to the customer support team, saving $20,000/year. (3) Increased release velocity 15–20%. (4) Reduced QA cycles from 3+ days to 15 minutes.

featured in #528


Getting 100% Code Coverage Doesn't Eliminate Bugs

- Kostis Kapelonis tl;dr: “There are many articles already on the net explaining why this is a fallacy, but I recently discovered that sharing an actual code example goes a long way towards proving why 100% code coverage doesn’t mean zero bugs. These people have their “aha” moment when they look at real code, instead of recycling theoretical arguments over and over.”

featured in #527


Debugging With Production Neighbors

tl;dr: SLATE is Uber’s E2E testing tool for microservice architectures that allows testing of services alongside production dependencies. It enables developers to generate test requests mimicking production flows while targeting services under test. This blog explores three debugging options in SLATE: remote debugging of deployed instances, local debugging on developer machines, and debugging through filtered monitoring. These features aim to simplify troubleshooting in production-like environments.

featured in #525


Flaky Tests Overhaul At Uber

tl;dr: “A few years ago, we started tackling flaky tests in an effort to stabilize CI experience across our monorepos. The project first debuted in our Java monorepo and received good results in driving down frictions in developers’ workflow. However, as we evolved our CI infrastructure and started onboarding it to our largest repository with the most users, Go Monorepo, the stop-gap solution became increasingly challenging to scale to the scope.” The authors discuss a centralized system to track all tests. 

featured in #521