Inference
Notes on inference for LLMs
| Title | Description |
|---|---|
| Max Inference Engine | Attempting to load Mistral-7b in Modular’s new Max Inference Engine |
| Optimizing latency | An exploration of ways to optimize on latency. |
| vLLM & large models | Using tensor parallelism w/ vLLM & Modal to run Llama 70b |
No matching items