Inference

Notes on inference for LLMs

Title	Description
Max Inference Engine	Attempting to load Mistral-7b in Modular’s new Max Inference Engine
Optimizing latency	An exploration of ways to optimize on latency.
vLLM & large models	Using tensor parallelism w/ vLLM & Modal to run Llama 70b