Evals

Notes on evals
Title Description
Frequently Asked Questions (And Answers) About AI Evals FAQ from our course on AI Evals.
How Engineers and PMs should collaborate on Evals How to align AI evaluations with business metrics, communicate value to stakeholders, and build a…
Modern IR Evals For RAG Nandan Thakur on why traditional IR evals are insufficient for RAG and how new benchmarks like…
Inspect AI, An OSS Python Library For LLM Evals A look at Inspect AI with its creator, JJ Allaire.
A Field Guide to Rapidly Improving AI Products Evaluation methods, data-driven improvement, and experimentation techniques from 30+ production…
Intro To Error Analysis: Creating Custom Data Annotation Apps (4k version) In this lesson, Shreya Shankar and Hamel Husain walk through the process of error analysis from…
Creating a LLM-as-a-Judge That Drives Business Results A step-by-step guide with my learnings from 30+ AI implementations.
How to Construct Domain Specific LLM Evaluation Systems AI Engineering World’s Fair talk on building evaluation systems for LLMs.
Your AI Product Needs Evals How to construct domain-specific LLM evaluation systems.
Evaluating and Productionizing LLMs Vanishing Data Podcast discussion on evaluating and putting LLMs into production.
No matching items