PyTorch is one of the most popular open source libraries for deep learning. Developers and researchers particularly enjoy the flexibility it gives them in building and training models. Yet, this is only half the story, and deploying and managing models in production is often the most difficult part of the machine learning process: building bespoke prediction APIs, scaling them, securing them, etc.
One way to simplify the model deployment process is to use a model server, i.e. an off-the-shelf web application specially designed to serve machine learning predictions in production. Model servers make it easy to load one or several models, automatically creating a prediction API backed by a scalable web server. They’re also able to run preprocessing and postprocessing code on prediction requests. Last but not least, model servers also provide production-critical features like logging, monitoring, and security. Popular model servers include TensorFlow Serving and the Multi Model Server.
Today, I’m extremely happy to announce TorchServe, a PyTorch model serving library that makes it easy to deploy trained PyTorch models at scale without having to write custom code.
TorchServe is a collaboration between AWS and Facebook, and it’s available as part of the PyTorch open source project. If you’re interested in how the project was initiated, you can read the initial RFC on Github.
With TorchServe, PyTorch users can now bring their models to production quicker, without having to write custom code: on top of providing a low latency prediction API, TorchServe embeds default handlers for the most common applications such as object detection and text classification. In addition, TorchServe includes multi-model serving, model versioning for A/B testing, monitoring metrics, and RESTful endpoints for application integration. As you would expect, TorchServe supports any machine learning environment, including Amazon SageMaker, container services, and Amazon Elastic Compute Cloud (EC2).
Several customers are already enjoying the benefits of TorchServe.
Toyota Research Institute Advanced Development, Inc. (TRI-AD) is developing software for automated driving at Toyota Motor Corporation. Says Yusuke Yachide, Lead of ML Tools at TRI-AD: “we continuously optimize and improve our computer vision models, which are critical to TRI-AD’s mission of achieving safe mobility for all with autonomous driving. Our models are trained with PyTorch on AWS, but until now PyTorch lacked a model serving framework. As a result, we spent significant engineering effort in creating and maintaining software for deploying PyTorch models to our fleet of vehicles and cloud servers. With TorchServe, we now have a performant and lightweight model server that is officially supported and maintained by AWS and the PyTorch community”.
Matroid is a maker of computer vision software that detects objects and events in video footage. Says Reza Zadeh, Founder and CEO at Matroid Inc.: “we develop a rapidly growing number of machine learning models using PyTorch on AWS and on-premise environments. The models are deployed using a custom model server that requires converting the models to a different format, which is time-consuming and burdensome. TorchServe allows us to simplify model deployment using a single servable file that also serves as the single source of truth, and is easy to share and manage”.
First, I fire up a CPU-based Amazon Elastic Compute Cloud (EC2) instance running the Deep Learning AMI (Ubuntu edition). This AMI comes preinstalled with several dependencies that I’ll need, which will speed up setup. Of course you could use any AMI instead.
TorchServe is implemented in Java, and I need the latest OpenJDK to run it.
sudo apt install openjdk-11-jdk
conda create -n torchserve
source activate torchserve
Next, I install dependencies for TorchServe.
pip install sentencepiece # not available as a Conda package
conda install psutil pytorch torchvision torchtext -c pytorch
If you’re using a GPU instance, you’ll need an extra package.
conda install cudatoolkit=10.1
git clone https://github.com/pytorch/serve.git
pip install .
pip install .
Setup is complete, let’s deploy a model!
Deploying a Model
For the sake of this demo, I’ll simply download a pretrained model from the PyTorch model zoo. In real life, you would probably use your own model.
Next, I need to package the model into a model archive. A model archive is a ZIP file storing all model artefacts, i.e. the model itself (densenet161-8d451a50.pth), a Python script to load the state dictionary (matching tensors to layers), and any extra file you may need. Here, I include a file named index_to_name.json, which maps class identifiers to class names. This will be used by the built-in image_classifier handler, which is in charge of the prediction logic. Other built-in handlers are available (object_detector, text_classifier, image_segmenter), and you can implement your own.
torch-model-archiver --model-name densenet161 --version 1.0 \
--model-file examples/image_classifier/densenet_161/model.py \
--serialized-file densenet161-8d451a50.pth \
--extra-files examples/image_classifier/index_to_name.json \
Next, I create a directory to store model archives, and I move the one I just created there.
mv densenet161.mar model_store/
Now, I can start TorchServe, pointing it at the model store and at the model I want to load. Of course, I could load several models if needed.
torchserve --start --model-store model_store --models densenet161=densenet161.mar
Still on the same machine, I grab an image and easily send it to TorchServe for local serving using an HTTP POST request. Note the format of the URL, which includes the name of the model I want to use.
curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
curl -X POST http://127.0.0.1:8080/predictions/densenet161 -T kitten.jpg
The result appears immediately. Note that class names are visible, thanks to the built-in handler.
I then stop TorchServe with the ‘stop‘ command.
As you can see, it’s easy to get started with TorchServe using the default configuration. Now let me show you how to set it up for remote serving.
Configuring TorchServe for Remote Serving
Let’s create a configuration file for TorchServe, named config.properties (the default name). This files defines which model to load, and sets up remote serving. Here, I’m binding the server to all public IP addresses, but you can restrict it to a specific address if you want to. As this is running on an EC2 instance, I need to make sure that ports 8080 and 8081 are open in the Security Group.
Now I can start TorchServe in the same directory, without having to pass any command line arguments.
Moving back to my local machine, I can now invoke TorchServe remotely, and get the same result.
curl -X POST http://ec2-54-85-61-250.compute-1.amazonaws.com:8080/predictions/densenet161 -T kitten.jpg
You probably noticed that I used HTTP. I’m guessing a lot of you will require HTTPS in production, so let me show you how to set it up.
Configuring TorchServe for HTTPS
TorchServe can use either the Java keystore or a certificate. I’ll go with the latter.
First, I create a certificate and a private key with openssl.
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem
Then, I update the configuration file to define the location of the certificate and key, and I bind TorchServe to its default secure ports (don’t forget to update the Security Group).
I restart TorchServe, and I can now invoke it with HTTPS. As I use a self-signed certificate, I need to pass the ‘–insecure’ flag to curl.
curl --insecure -X POST https://ec2-54-85-61-250.compute-1.amazonaws.com:8443/predictions/densenet161 -T kitten.jpg
Give it a try, and please send us feedback on Github.