• Blog
  • Notes
  • Hire Me
  • OSS
  • Teaching
  1. Notes
  2. LLMs
  3. Evals
  4. Evals Flashcards
  • Notes
    • Python Concurrency
    • CUDA Version Management
    • How to learn
    • pandoc filters
    • Docker
    • dbt
    • programming languages
    • Video Editing
    • LLMs
      • Inference
        • Optimizing latency
        • Max Inference Engine
        • vLLM & large models
      • OpenAI
        • Function prompts
      • Evals
        • Evals Flashcards
        • Inspect AI, An OSS Python Library For LLM Evals
      • Function Calling
        • Llama-3 Func Calling
      • Fine-tuning
        • Dataset Basics
        • LangChain DocumentLoaders
        • Estimating vRAM
        • Curating LLM data
        • Tokenization Gotchas
        • Template-free axolotl
      • RAG
        • Stop Saying RAG Is Dead
        • P1: I don’t use RAG, I just retrieve documents
        • P2: Modern IR Evals For RAG
        • P3: Optimizing Retrieval with Reasoning Models
        • P4: Late Interaction Models For RAG
        • P5: RAG with Multiple Representations
        • P6: Context Rot
      • Open Office Hours
        • Evals: Doing Error Analysis Before Writing Tests
        • Multi-Turn Chat Evals
        • Observability in LLM Applications
        • Tame Complexity By Scoping LLM Evals
    • ML Serving
      • TF Serving
        • Basics
        • GPUs & Batching
      • TorchServe
        • Basics
        • Serving Your Own Model
      • FastAPI
    • K8s
      • Basics
      • Secrets
      • Storage
        • Storage Basics
        • Dynamic Provisioning
      • Scaling
        • ReplicaSets
        • Scaling
      • StatefulSet
      • Jobs & CronJobs
      • Rollouts
      • Multi-Container Pods
        • Multi-Container Pods
        • Ambassador Sidecars
        • Restart Conditions
        • Sharing Processes in MC Pods
      • Helm
        • Helm Intro
        • Creating Helm Charts
        • Helm Upgrades & Rollbacks
        • Testing With Helm
      • Developer tips
      • Pod restart vs. replacement
      • Probes
      • Resource Limits
      • Requesting resources
      • Logging
      • Monitoring
      • Ingress
      • Cluster Components
      • Security
        • Network Security
        • Securing Containers
        • Webhooks
        • Updating a K8s Cluster
        • RBAC
      • Workload Placement
      • Auto Scaling
      • Preemption
      • Random TILs
      • Open Questions
    • fastai
      • Fundamentals
      • Image Classification
      • Data
      • Batch Predictions
    • Linux
      • Cheatsheet
      • Cookbook
      • Misc Utilities
      • OSX Shell Tips
      • Processes, Permissions and Moving Data
    • GitHub Actions
      • ocotokit.js
      • Resources
    • Prompt engineering
      • Course
        • Guidelines for Prompting
        • Iterative Prompt Development
        • Summarizing
        • Inferring
        • Transforming
        • Expanding
        • The Chat Format
    • Web Scraping
      • Browser requests to code
      • Transcribe & Diarize Videos
    • FastHTML
      • Building Annotation Apps with FastHTML
      • Concurrency For Starlette Apps (e.g FastAPI / FastHTML)
    • Quarto
      • Syntax Highlighting
      • Listings from data
      • Merge listings
    • Jupyter
      • Launch Jupyter on a remote server
      • Fix Jupyter CUDA cache
      • Remote Browser For Jupyter
      • My Jupyter Shortcuts
    • Coding Agents
      • Amp


  1. Notes
  2. LLMs
  3. Evals
  4. Evals Flashcards

Evals Flashcards

I created these flashcards to help students learn about evals in our AI Evals course.

Click on any image below to open a full-size version in a new tab, or download the image by clicking the download icon on the bottom right of each card.

1

2

3

4

5

6

7

8

9

10

11

12

 
  • Edit this page