/AI

Autonomous Coding: Are We There Yet?

tl;dr: BlueOptima's study of 110k developers and 82 million code changes provides a coding automation framework inspired by the SAE Driving Automation levels. It shows that whilst we are currently very far from full automation the rapid rise of LLMs since 2022 requires preparation for new coding standards, hybrid expertise to mitigate quality and security issues and tailoring of automation to specific needs to maximise impact.

featured in #574


Introducing The Prompt Engineering Toolkit

tl;dr: “To facilitate rapid iteration and experimentation of LLMs at Uber, there was a need for centralization to seamlessly construct prompt templates, manage them, and execute them against various underlying LLMs to take advantage of LLM support tasks. To meet these needs, we built a prompt engineering toolkit that offers standard strategies that encourage prompt engineers to develop well-crafted prompt templates.”

featured in #572


Are You Shipping More With GitHub Copilot? Is More Roadmap Work Being Done?

tl;dr: Engineering leaders are trying to figure out if their team is using GitHub Copilot, how much they're using it, and its impact on their work. Download this slide deck from Jellyfish, analyzing data from 4,200+ developers at 200+ companies, and start understanding whether you're getting adequate return on your AI investments.

featured in #571


Shrinking A Postgres Table

- John Nunemaker tl;dr: John discovered his Postgres database was using 87% disk space, mainly due to unprocessed downloads in a podcast hosting app. Rather than batch-deleting millions of old records, they used a table-swapping technique to create a new table with only recent data, freeing up significant space quickly and efficiently.

featured in #570


Mirror: An LLM-powered Programming-By-Example Programming Language

- Austin Henley tl;dr: “Programming by example is a technique where users provide examples of the outcome they want, and the system generates code that can perform it. For example, in Excel, you can demonstrate how you want a column formatted through an example or two, and Excel will learn a pattern and apply it to the rest. But what if there was a programming language that only allows programming by example? Can we integrate AI into traditional programming languages?”

featured in #569


Expanding The Solution Size With Multi-File Editing

- Birgitta Böckeler tl;dr: “A very powerful new coding assistance feature made its way into GitHub Copilot at the end of October. This new “multi-file editing” capability expands the scope of AI assistance from small, localized suggestions to larger implementations across multiple files. Previously, developers could rely on Copilot for minor assistance, such as generating a few lines of code within a single method. Now, the tool can tackle larger tasks, simultaneously editing multiple files and implementing several steps of a larger plan. This represents a step change for coding assistance workflows.”

featured in #568


Innovations In Evaluating AI Agent Performance

tl;dr: Just like athletes need more than one drill to win a competition, AI agents require consistent training based on real-world performance metrics to excel in their role.  At QA Wolf, we’ve developed weighted “gym scenarios” to simulate real-world challenges and track their progress over time. How does our AI use these metrics to continuously improve our accuracy? Visit our website to learn more.

featured in #564


Innovations In Evaluating AI Agent Performance

tl;dr: Just like athletes need more than one drill to win a competition, AI agents require consistent training based on real-world performance metrics to excel in their role.  At QA Wolf, we’ve developed weighted “gym scenarios” to simulate real-world challenges and track their progress over time. How does our AI use these metrics to continuously improve our accuracy?  Watch our latest webinar to learn more.

featured in #563


Everything I Built With Claude Artifacts This Week

- Simon Willison tl;dr: “I’m a huge fan of Claude’s Artifacts feature, which lets you prompt Claude to create an interactive Single Page App (using HTML, CSS and JavaScript) and then view the result directly in the Claude interface, iterating on it further with the bot and then, if you like, copying out the resulting code.” Simon shares what he built in the last 7 days. 

featured in #562


Innovations In Evaluating AI Agent Performance

tl;dr: Just like athletes need more than one drill to win a competition, AI agents require consistent training based on real-world performance metrics to excel in their role.  At QA Wolf, we’ve developed weighted “gym scenarios” to simulate real-world challenges and track their progress over time. How does our AI use these metrics to continuously improve our accuracy?  Watch our latest webinar to learn more.

featured in #561