Torch Serve basics

Model archiver

The key to understanding TorchServe is to first understand torch-model-archiver which packages model artifacts into a single model archive file (.mar). torch-model-archive needs the following inputs:


Need a model checkpoint file

Eager Mode (more common)

Need a model definition file and a state_dict file.


The CLI produces a .mar file. Below is an example of archiving an eager mode model.

!torch-model-archiver --model-name densenet161 \
    --version 1.0 \
    --model-file ./_serve/examples/image_classifier/densenet_161/ \
    --serialized-file densenet161-8d451a50.pth \
    --export-path model_store \
    --extra-files ./_serve/examples/image_classifier/index_to_name.json \
    --handler image_classifier \
WARNING - Overwriting model_store/densenet161.mar ...

This is the model file:


Options for model archiver:

! torch-model-archiver --help
usage: torch-model-archiver [-h] --model-name MODEL_NAME
                            [--serialized-file SERIALIZED_FILE]
                            [--model-file MODEL_FILE] --handler HANDLER
                            [--extra-files EXTRA_FILES]
                            [--runtime {python,python2,python3}]
                            [--export-path EXPORT_PATH]
                            [--archive-format {tgz,no-archive,default}] [-f]
                            -v VERSION [-r REQUIREMENTS_FILE]

Torch Model Archiver Tool

optional arguments:
  -h, --help            show this help message and exit
  --model-name MODEL_NAME
                        Exported model name. Exported file will be named as
                        model-name.mar and saved in current working directory if no --export-path is
                        specified, else it will be saved under the export path
  --serialized-file SERIALIZED_FILE
                        Path to .pt or .pth file containing state_dict in case of eager mode
                        or an executable ScriptModule in case of TorchScript or TensorRT
                        or a .onnx file in the case of ORT.
  --model-file MODEL_FILE
                        Path to python file containing model architecture.
                        This parameter is mandatory for eager mode models.
                        The model architecture file must contain only one
                        class definition extended from torch.nn.modules.
  --handler HANDLER     TorchServe's default handler name
                         or Handler path to handle custom inference logic.
  --extra-files EXTRA_FILES
                        Comma separated path to extra dependency files.
  --runtime {python,python2,python3}
                        The runtime specifies which language to run your inference code on.
                        The default runtime is "python".
  --export-path EXPORT_PATH
                        Path where the exported .mar file will be saved. This is an optional
                        parameter. If --export-path is not specified, the file will be saved in the
                        current working directory. 
  --archive-format {tgz,no-archive,default}
                        The format in which the model artifacts are archived.
                        "tgz": This creates the model-archive in <model-name>.tar.gz format.
                        If platform hosting TorchServe requires model-artifacts to be in ".tar.gz"
                        use this option.
                        "no-archive": This option creates an non-archived version of model artifacts
                        at "export-path/{model-name}" location. As a result of this choice, 
                        MANIFEST file will be created at "export-path/{model-name}" location
                        without archiving these model files
                        "default": This creates the model-archive in <model-name>.mar format.
                        This is the default archiving format. Models archived in this format
                        will be readily hostable on native TorchServe.
  -f, --force           When the -f or --force flag is specified, an existing .mar file with same
                        name as that provided in --model-name in the path specified by --export-path
                        will overwritten
  -v VERSION, --version VERSION
                        Model's version
                        Path to a requirements.txt containing model specific python dependency


TorchServe has the following handlers built-in that do post and pre-processing:

  • image_classifier
  • object_detector
  • text_classifier
  • image_segmenter

You can implement your own custom handler by following these docs. Most of the time you only need to subclass BaseHandler and override preprocess and/or postprocess.

--extra-files ... index_to_name.json:

From the docs:

image_classifier, text_classifier and object_detector can all automatically map from numeric classes (0,1,2…) to friendly strings. To do this, simply include in your model archive a file, index_to_name.json, that contains a mapping of class number (as a string) to friendly name (also as a string).


After archiving you can start the modeling server:

torchserve --start --ncs \
    --model-store model_store \
    --models densenet161.mar

TorchServe uses default ports 8080 / 8081 / 8082 for REST based inference, management & metrics APIs and 7070 / 7071 for gRPC APIs.

!torchserve --help
usage: torchserve [-h] [-v | --start | --stop] [--ts-config TS_CONFIG]
                  [--model-store MODEL_STORE]
                  [--workflow-store WORKFLOW_STORE]
                  [--models MODEL_PATH1 MODEL_NAME=MODEL_PATH2... [MODEL_PATH1 MODEL_NAME=MODEL_PATH2... ...]]
                  [--log-config LOG_CONFIG] [--foreground]
                  [--no-config-snapshots] [--plugins-path PLUGINS_PATH]


optional arguments:
  -h, --help            show this help message and exit
  -v, --version         Return TorchServe Version
  --start               Start the model-server
  --stop                Stop the model-server
  --ts-config TS_CONFIG
                        Configuration file for model server
  --model-store MODEL_STORE
                        Model store location from where local or default
                        models can be loaded
  --workflow-store WORKFLOW_STORE
                        Workflow store location from where local or default
                        workflows can be loaded
                        Models to be loaded using [model_name=]model_location
                        format. Location can be a HTTP URL or a model archive
                        file in MODEL_STORE.
  --log-config LOG_CONFIG
                        Log4j configuration file for model server
  --foreground          Run the model server in foreground. If this option is
                        disabled, the model server will run in the background.
  --no-config-snapshots, --ncs
                        Prevents to server from storing config snapshot files.
  --plugins-path PLUGINS_PATH, --ppath PLUGINS_PATH
                        plugin jars to be included in torchserve class path
!curl -O
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7341  100  7341    0     0   108k      0 --:--:-- --:--:-- --:--:--  108k
!curl -T kitten_small.jpg
  "tabby": 0.4783327877521515,
  "lynx": 0.19989627599716187,
  "tiger_cat": 0.1682717651128769,
  "tiger": 0.061949197202920914,
  "Egyptian_cat": 0.05116736516356468

I wouldn’t recommend installing torchserve and running it on a VM. It’s probably easier to use Docker.

docker pull pytorch/torchserve


See these docs. We have to mount the necessary files and run the same commands. We also have to expose all the ports, etc.


Note that you have to supply the torchserve command, which implies you can run other things (but I don’t know what those are).

docker run --rm -it --gpus '"device=0"' \
    -p 8080:8080 \
    -p 8081:8081 \
    -p 8082:8082 \
    -p 7070:7070 \
    -p 7071:7071 \
    --mount type=bind,source=/home/hamel/hamel/notes/serving/torchserve/model_store,target=/tmp/models \
    pytorch/torchserve:latest-gpu \
    torchserve \
    --model-store /tmp/models \
    --models densenet161.mar
!curl -T kitten_small.jpg
  "tabby": 0.4783327877521515,
  "lynx": 0.19989627599716187,
  "tiger_cat": 0.1682717651128769,
  "tiger": 0.061949197202920914,
  "Egyptian_cat": 0.05116736516356468

Other Notes

I found these articles to be very important:

  1. Source code for BaseHandler.
  2. Performance guide: Concurrency and number of workers.
  3. example 1 and example 2 of how you can pass configuration files