/Tests

You Make Your Evals, Then Your Evals Make You.

- Tongfei Chen Yury Zemlyanskiy tl;dr: The post introduces AugmentQA, a benchmark for evaluating code retrieval systems using real-world software development scenarios rather than synthetic problems. AugmentQA uses codebases, developer questions, and keyword-based evaluation outperforming open-source models that excel on synthetic benchmarks but struggle with realistic tasks.

featured in #603


Making Uber’s ExperimentEvaluation Engine 100x Faster

tl;dr: This blog post describes how we made efficiency improvements to Uber’s Experimentation platform to reduce the latencies of experiment evaluations by a factor of 100x, milliseconds to microseconds. We accomplished this by going from a remote evaluation architecture to a local evaluation architecture.

featured in #603


The QA Wolf Advantage: Vertical Integration For Superior QA

- Jon Perl tl;dr: Traditional outsourced QA relies on inefficient, costly tech stacks that fall short of QA engineers' needs. QA Wolf took a smarter approach. They built proprietary technology that aligns with customers’ needs, enabling their QA engineers to deliver 80%+ automated test coverage for their clients in just 4 months. In this free webinar, CEO Jon Perl reveals how QA Wolf is redefining QA automation.

featured in #602


Accelerating Large-Scale Test Migration With LLMs

- Charles Covey-Brandt tl;dr: “In this blog post, we’ll highlight the unique challenges we faced migrating from Enzyme to RTL, how LLMs excel at solving this particular type of challenge, and how we structured our migration tooling to run an LLM-driven migration at scale.”

featured in #601


How 40 Lines Of Code Sped Up iOS End To End Tests By Over 50%

- Jordan Wood tl;dr: “With roughly 30k unit tests and nearly 1k iOS end-to-end tests, speeding up our E2E suite has become a top priority as it takes a substantial amount of time to run. In this post, we’ll highlight how we sped up our tests by 50% with a small, targeted change.”

featured in #601


The QA Wolf Advantage: Vertical Integration For Superior QA

- Jon Perl tl;dr: Traditional outsourced QA relies on inefficient, costly tech stacks that fall short of QA engineers' needs. QA Wolf took a smarter approach. They built proprietary technology that aligns with customers’ needs, enabling their QA engineers to deliver 80%+ automated test coverage for their clients in just 4 months. In this free webinar, CEO Jon Perl reveals how QA Wolf is redefining QA automation.

featured in #598


Arrange Your Code To Communicate Data Flow

- Sebastian Dörner tl;dr: “We often read code linearly, from one line to the next. To make code easier to understand and to reduce cognitive load for your readers, make sure that adjacent lines of code are coherent. One way to achieve this is to order your lines of code to match the data flow inside your method.”

featured in #580


SMURF: Beyond The Test Pyramid

- Adam Bender tl;dr: “The test pyramid is the canonical heuristic for guiding test suite evolution. It conveys a simple message - prefer more unit tests than integration tests, and prefer more integration tests than end-to-end tests. While useful, the test pyramid lacks the details you need as your test suite grows and you face challenging trade-offs. To scale your test suite, go beyond the test pyramid. The SMURF mnemonic is an easy way to remember the tradeoffs to consider when balancing your test suite.”

featured in #559


How To Test

- Alex Kladov tl;dr: “This post describes my current approach to testing. When I started programming professionally, I knew how to write good code, but good tests remained a mystery for a long time. This is not due to the lack of advice — on the contrary, there’s abundance of information & terminology about testing.”

featured in #550


Shifting E2E Testing Left At Uber

- Quess Liu Daniel Tsui tl;dr: “In this blog, we describe how we built a system that gates every code and configuration change to our core backend systems (1,000+ services). We have several thousand E2E tests that have an average pass rate of 90%+ per attempt. Imagine each of these tests going through a real E2E user flow, like going through an Uber Eats group order. We do all this fast enough to run on every diff before it gets landed.”

featured in #544