/John Gluck

6 Hard Lessons We Learned About Automated Testing For GenAI Apps tl;dr: Testing LLMs is not simple. The probabilistic output makes failures hard to identify while running the models repeatedly tends to become very expensive quickly. In this blog post, QA Wolf engineer John Gluck covers 6 things the team learned about building automated black-box regression tests for genAI applications.

featured in #531


6 Hard Lessons We Learned About Automated Testing For GenAI Apps tl;dr: Testing LLMs is not simple. The probabilistic output makes failures hard to identify while running the models repeatedly tends to become very expensive quickly. In this blog post, QA Wolf engineer John Gluck covers 6 things the team learned about building automated black-box regression tests for genAI applications.

featured in #529