Evals
Notes on evals
Title | Description |
---|---|
Frequently Asked Questions (And Answers) About AI Evals | FAQ from our course on AI Evals. |
How Engineers and PMs should collaborate on Evals | How to align AI evaluations with business metrics, communicate value to stakeholders, and build a… |
Modern IR Evals For RAG | Nandan Thakur on why traditional IR evals are insufficient for RAG and how new benchmarks like… |
Inspect AI, An OSS Python Library For LLM Evals | A look at Inspect AI with its creator, JJ Allaire. |
A Field Guide to Rapidly Improving AI Products | Evaluation methods, data-driven improvement, and experimentation techniques from 30+ production… |
Intro To Error Analysis: Creating Custom Data Annotation Apps (4k version) | In this lesson, Shreya Shankar and Hamel Husain walk through the process of error analysis from… |
Creating a LLM-as-a-Judge That Drives Business Results | A step-by-step guide with my learnings from 30+ AI implementations. |
How to Construct Domain Specific LLM Evaluation Systems | AI Engineering World’s Fair talk on building evaluation systems for LLMs. |
Your AI Product Needs Evals | How to construct domain-specific LLM evaluation systems. |
Evaluating and Productionizing LLMs | Vanishing Data Podcast discussion on evaluating and putting LLMs into production. |
No matching items