Inference

Notes on inference for LLMs
Title Description
Max Inference Engine Attempting to load Mistral-7b in Modular’s new Max Inference Engine
Optimizing latency An exploration of ways to optimize on latency.
vLLM & large models Using tensor parallelism w/ vLLM & Modal to run Llama 70b
No matching items