Inference
Notes on inference for LLMs
Title | Description |
---|---|
Max Inference Engine | Attempting to load Mistral-7b in Modular’s new Max Inference Engine |
Optimizing latency | An exploration of ways to optimize on latency. |
vLLM & large models | Using tensor parallelism w/ vLLM & Modal to run Llama 70b |
No matching items