/Tests

Increase Test Fidelity By Avoiding Mocks

- Andrew Trenk Dillon Bly tl;dr: “Aim for as much fidelity as you can achieve without increasing the size of a test. At Google, tests are classified by size. Most tests should be small: they must run in a single process and must not wait on a system or event outside of their process. Increasing the fidelity of a small test is often a good choice if the test stays within these constraints. A healthy test suite also includes medium and large tests, which have higher fidelity since they can use heavyweight dependencies that aren’t feasible to use in small tests, e.g., dependencies that increase execution times or call other processes.”

featured in #493


Meta's New LLM-Based Test Generator Is A Sneak Peek To The Future Of Development

- Leonardo Creed tl;dr: “Meta claims that this “this is the first paper to report on LLM-generated code that has been developed independent of human intervention (other than final review sign off), and landed into large scale industrial production systems with guaranteed assurances for improvement over the existing code base.” Furthermore, there are solid principles that developers can take away in order to use AI effectively themselves.” 

featured in #492


Too Much Of A Good Thing: The Trade-Off We Make With Tests

- Nicole Tietz-Sokolskaya tl;dr: “If you aim for 100% code coverage, you're saying that any risk of bug is a risk you want to avoid. And if you have no tests, you're saying that it's okay if you have severe bugs with maximum cost.” Nicole presents us with a way to think about how much code coverage is enough. You need two numbers: (1) The cost of writing tests. To get this, you have to measure how much time is spent on testing. (2) The cost of bugs. Getting this number is more complicated. You can measure the time your team spends on triaging and fixing bugs. The rest of it, you'll estimate with management and product. The idea here is just to get close enough to understand the trade-off, not to be exact.

featured in #487


Feature Flags Spaghetti // FFs Missing Features

- Eliran Turgeman tl;dr: “I feel like there are some key features missing that would make me switch vendors. I mainly have two problems with current solutions: (1) It can get tedious and messy to turn on/off a feature when multiple FFs were placed for it. (2) Your codebase becomes a FF graveyard if you don’t remember cleaning it, and you probably don’t…” Eli provides suggestions on how to address these. 

featured in #486


The Day I Started Believing In Unit Tests

- Benjamin Richner tl;dr: “The test ran hundreds if not thousands of times successfully. What a waste of time... But then, one day, we started observing test failures. Not many, maybe three over the course of a few weeks. The test actually crashed with a Segmentation Fault, so it was clear that it was a severe error. Interestingly, none of the code under test had actually changed. Well, that's definitely something we had to investigate! I spare you the details of the search for the error, but eventually, I was able to reproduce the problem while a debugger was attached, so the entire context of the problem was handed to me on a silver platter.”

featured in #475


Canon TDD

- Kent Beck tl;dr: Test-driven development (TDD) is a programming method where new features are added without disrupting existing functions. It ensures new and old features work correctly, readies the system for future updates, and builds programmer confidence. The flow is as follows: (1) Write a list of the test scenarios you want to cover. (2) Turn exactly one item on the list into an actual, concrete, runnable test. (3) Change the code to make the test (& all previous tests) pass (adding items to the list as you discover them). (4) Optionally refactor to improve the implementation design. (5) Until the list is empty, go back to #2.

featured in #473


Pytest Daemon: 10X Local Test Iteration Speed

- Ruby Feinstein tl;dr: Discord utilizes a Python monolith to power its API, from sending messages to managing subscriptions. To support this, they use pytest to write and run unit tests. Over the last 8 years, the time it takes to run a single test has continuously grown until it reached a point where it takes 13 seconds to run a single test. even if the test ends up doing absolutely nothing. This post discusses how tests were sped up.

featured in #472


How Much Testing Is Enough?

- George Pirocanac tl;dr: George addresses the complexity of determining adequate testing for software releases. It suggests a multi-faceted approach, emphasizing the importance of documenting the testing process, having a solid base of unit tests, not overlooking integration testing, and performing end-to-end testing for critical user journeys. The article also highlights the need to understand various testing tiers, such as performance, load, fault-tolerance, security, and usability testing. Additionally, it stresses the significance of understanding code and functionality coverage and using field feedback for process improvement.

featured in #467


Else Nuances

- Sam Lee Stan Chan tl;dr: "If your function exits early in an if statement, using or not using an else clause is equivalent in terms of behavior. However, the proper use of else clauses and guard clauses (lack of else) can help emphasize the intent of the code to the reader." The authors discuss this with examples.

featured in #465


Tests Too DRY? Make Them DAMP!

- Derek Snyder Erik Kuefler tl;dr: The authors discuss the balance between the DRY (Don't Repeat Yourself) and DAMP (Descriptive and Meaningful Phrases) principles in unit testing. While DRY promotes code reuse and minimizes duplication, it may not always suit unit tests, as it can make them less readable and harder to manually inspect for correctness. The authors argue for prioritizing DAMP in tests to enhance readability, even if it leads to some code redundancy. They illustrate this with an example where creating users and assertions directly in the test, rather than using helper methods or loops, makes the test clearer. They acknowledge the relevance of DRY in tests for certain aspects but suggest leaning towards DAMP for better clarity and understanding in unit tests.

featured in #464