Evals

Notes on evals

Title	Description
Frequently Asked Questions (And Answers) About AI Evals	FAQ from our course on AI Evals.
Intro To Error Analysis With Just Spreadsheets	In this lesson, Shreya Shankar and Hamel Husain walk through the process of error analysis from…
Your AI Product Needs Evals	How to construct domain-specific LLM evaluation systems.
Creating a LLM-as-a-Judge That Drives Business Results	A step-by-step guide with my learnings from 30+ AI implementations.
A Field Guide to Rapidly Improving AI Products	Evaluation methods, data-driven improvement, and experimentation techniques from 30+ production…
How to Construct Domain Specific LLM Evaluation Systems	AI Engineering World’s Fair talk on building evaluation systems for LLMs.
How Engineers and PMs should collaborate on Evals	How to align AI evaluations with business metrics, communicate value to stakeholders, and build a…
Inspect AI, An OSS Python Library For LLM Evals	A look at Inspect AI with its creator, JJ Allaire.
The Revenge of the Data Scientist	Five eval pitfalls AI engineers keep falling into, and why data science fundamentals are the fix.
Evals Flashcards	I created these flashcards to help students learn about evals in our AI Evals course.
Evals Memes	Evals can be a dry subject: data pipelines, metrics, LLM-as-a-judge calibration. But that doesn’t…