TorchServe

Notes on TorchServe

Experiments With Torch Serve

Title	Description
Basics	Torch Serve basics
Serving Your Own Model	Serving a HuggingFace Model

Why Use Torch Serve

Automatic batching of requests (optional).
Model versioning.
Out of the box Logging and metrics.

Impressions

Torch Serve is an absolute PITA to debug, because its part java and part python (ex: you can’t use an interactive debugger). If you do not need auto-batching, I would rather use FastAPI and scale it with Kubernetes. You have to end up creating custom handlers in many practical scenarios (like using HF models), and at the end of the day you are writing a bunch of code to glue everything together.
In comparison to TF Serving, it is harder to get started, but much easier to customize things once you learn the API. The initial learning curve to TorchServe is much steeper than TF Serving, because you have to study the BaseHandler class (read the source code) to understand how things work. It is not clear how the various artifacts you save with the torch-model-archiver work together unless you study BaseHandler. For example:
- the help docs of torch-model-archiver state that --model-file is mandatory for eager mode models, which is not entirely true because you can load the model in the handler instead of implementing an interface that torch serve knows how to load (e.g. a class with a load_state_dict method like this example. Furthermore, this model file can only contain one class definition extended from torch.nn.modules which is an odd constraint. When you search the internet for using torch-serve, many people are ignoring the prescribed interface and are loading models inside a custom handler instead. Here is an example of doing this with fastai and HuggingFace. I believe it is a better idea to use a custom handler because it is more transparent and easier to understand. The default handlers in the getting started guides are a bit too magical and I think they cause confusion for newcomers. I would argue that the default handlers should only be used after you understand BaseHandler and write a few custom handlers.
- If you want to use torchscript, your file extension must be .pt (not .pth). This is not documented anywhere, and is an example of something you can only learn from the BaseHandler source code (that particular file extension is hardcoded!).
pre and post processing in TorchServe is significantly easier to understand compared to TFServe. Custom handlers allow you to do what you want in pure python, whereas with TFServing you have to modify the model’s signature which involves pushing the pre/post processing into the model’s graph and other confusing steps that use DSLs. This is not surprising, as this is why people like PyTorch in general.
I like how you can make REST API requests to manage models. In contrast, TF Serving requires you to update config files (which it periodically checks for updates).
I like model versioning in TF Serving a bit more, as it allows you to alias your endpoints - for example “staging” or “production”. It doesn’t appear you can alias your model endpoints in TorchServe without a reverse proxy.