Inference

Notes on inference for LLMs
Title Description
Max Inference Engine Attempting to load Mistral-7b in Modular’s new…
Optimizing latency An exploration of ways to optimize on latency.
vLLM & large models Using tensor parallelism w/ vLLM & Modal to run…
No matching items