Load balancer check results in "[ERROR] epollEventLoopGroup-3-1 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer"
#2,201 opened on Mar 27, 2023
Description
🐛 Describe the bug
We are using TorchServe to serve a yolox_x model trained by mmedet. We created a customized TorchServe docker image and we wrote a simple docker-compose.yml file which runs on a Debian 11 host with:
Docker version 23.0.1, build a5ee5b1Docker Compose version v2.15.1
In our current deployment, we are using a load balancer based on HAProxy that communicates with the TorchServe hosts.
The HAProxy checks (every second) if the TorchServe hosts are up and running by using the route GET $HOSTNAME:8080/ping and if the response has status code 200 and the response body contains the word Healthy then everything is ok.
Unfortunately, looking at the TorchServe logs using the command docker logs -f torchserve (where torchserve is the container name) we noticed a set of error each time the HAProxy checks the TorchServe host is up and running.
Here an example of the errors.
2023-03-27T07:53:33,604 [INFO ] pool-2-thread-2 ACCESS_LOG - /LOADBLANCER_IP:55612 "GET /ping HTTP/1.0" 200 0
2023-03-27T07:53:33,604 [INFO ] pool-2-thread-2 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:33,606 [ERROR] epollEventLoopGroup-3-7 org.pytorch.serve.http.HttpRequestHandler -
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
where LOADBLANCER_IP is the anonymized ip of the load balancer host.
When we stop the HAProxy service, the TorchServe instance stops logging the error. It looks like a non-blocking error but it is not a healthy behavior.
We are also performing the same issue with the default torchserve docker image and another custom model.
A similar issue was opened in 2021, however closing/avoiding the health check is not a valid option in our scenario.
Error logs
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2023-03-27T07:53:16,338 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-03-27T07:53:16,425 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.7.1
TS Home: /usr/local/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /usr/local/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 960 M
Python executable: /usr/local/bin/python
Config file: model-store/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: yoloxx-coco=yoloxx-coco.mar
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/model-server/model-store
Model config: {"yoloxx-coco": {"1.0": {"defaultVersion": true,"marName": "yoloxx-coco.mar","minWorkers": 1,"maxWorkers": 10,"batchSize": 1,"maxBatchDelay": 100,"responseTimeout": 100}}}
2023-03-27T07:53:16,432 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2023-03-27T07:53:16,435 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: yoloxx-coco.mar
2023-03-27T07:53:20,703 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model yoloxx-coco
2023-03-27T07:53:20,703 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model yoloxx-coco
2023-03-27T07:53:20,703 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model yoloxx-coco loaded.
2023-03-27T07:53:20,703 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: yoloxx-coco, count: 1
2023-03-27T07:53:20,722 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-03-27T07:53:20,722 [DEBUG] W-9000-yoloxx-coco_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/local/bin/python, /usr/local/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /usr/local/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-03-27T07:53:20,785 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2023-03-27T07:53:20,786 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-03-27T07:53:20,787 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
2023-03-27T07:53:20,787 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-03-27T07:53:20,790 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.
2023-03-27T07:53:20,989 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2023-03-27T07:53:21,044 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:21,044 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:23.057697296142578|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:21,044 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:5.087863922119141|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:21,044 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:18.1|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:21,044 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:3169.96875|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:21,044 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:418.5859375|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:21,044 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:17.3|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:21,541 [INFO ] pool-2-thread-2 ACCESS_LOG - /LOADBLANCER_IP:55582 "GET /ping HTTP/1.0" 200 9
2023-03-27T07:53:21,542 [INFO ] pool-2-thread-2 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:21,549 [ERROR] epollEventLoopGroup-3-1 org.pytorch.serve.http.HttpRequestHandler -
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
2023-03-27T07:53:21,916 [INFO ] W-9000-yoloxx-coco_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2023-03-27T07:53:21,921 [INFO ] W-9000-yoloxx-coco_1.0-stdout MODEL_LOG - Successfully loaded /usr/local/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-03-27T07:53:21,921 [INFO ] W-9000-yoloxx-coco_1.0-stdout MODEL_LOG - [PID]32
2023-03-27T07:53:21,922 [INFO ] W-9000-yoloxx-coco_1.0-stdout MODEL_LOG - Torch worker started.
2023-03-27T07:53:21,922 [INFO ] W-9000-yoloxx-coco_1.0-stdout MODEL_LOG - Python runtime: 3.9.16
2023-03-27T07:53:21,922 [DEBUG] W-9000-yoloxx-coco_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-yoloxx-coco_1.0 State change null -> WORKER_STARTED
2023-03-27T07:53:21,924 [INFO ] W-9000-yoloxx-coco_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2023-03-27T07:53:21,931 [INFO ] W-9000-yoloxx-coco_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2023-03-27T07:53:21,933 [INFO ] W-9000-yoloxx-coco_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1679903601933
2023-03-27T07:53:21,942 [INFO ] W-9000-yoloxx-coco_1.0-stdout MODEL_LOG - model_name: yoloxx-coco, batchSize: 1
2023-03-27T07:53:22,582 [WARN ] W-9000-yoloxx-coco_1.0-stderr MODEL_LOG - /usr/local/lib/python3.9/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
2023-03-27T07:53:22,582 [WARN ] W-9000-yoloxx-coco_1.0-stderr MODEL_LOG - warnings.warn(
2023-03-27T07:53:23,554 [INFO ] pool-2-thread-2 ACCESS_LOG - /LOADBLANCER_IP:55586 "GET /ping HTTP/1.0" 200 0
2023-03-27T07:53:23,554 [INFO ] pool-2-thread-2 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:23,556 [ERROR] epollEventLoopGroup-3-2 org.pytorch.serve.http.HttpRequestHandler -
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
2023-03-27T07:53:23,672 [INFO ] W-9000-yoloxx-coco_1.0-stdout MODEL_LOG - generated new fontManager
2023-03-27T07:53:25,246 [INFO ] W-9000-yoloxx-coco_1.0-stdout MODEL_LOG - load checkpoint from local path: /home/model-server/tmp/models/dc79daf5690e45868dec208935c74c17/yolox_x_8x8_300e_coco_20211126_140254-1ef88d67.pth
2023-03-27T07:53:25,540 [INFO ] W-9000-yoloxx-coco_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 3595
2023-03-27T07:53:25,540 [DEBUG] W-9000-yoloxx-coco_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-yoloxx-coco_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2023-03-27T07:53:25,540 [INFO ] W-9000-yoloxx-coco_1.0 TS_METRICS - W-9000-yoloxx-coco_1.0.ms:4831|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903605
2023-03-27T07:53:25,540 [INFO ] W-9000-yoloxx-coco_1.0 TS_METRICS - WorkerThreadTime.ms:12|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903605
2023-03-27T07:53:25,564 [INFO ] pool-2-thread-2 ACCESS_LOG - /LOADBLANCER_IP:55596 "GET /ping HTTP/1.0" 200 0
2023-03-27T07:53:25,564 [INFO ] pool-2-thread-2 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:25,565 [ERROR] epollEventLoopGroup-3-3 org.pytorch.serve.http.HttpRequestHandler -
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
2023-03-27T07:53:27,573 [INFO ] pool-2-thread-2 ACCESS_LOG - /LOADBLANCER_IP:55600 "GET /ping HTTP/1.0" 200 1
2023-03-27T07:53:27,573 [INFO ] pool-2-thread-2 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:27,574 [ERROR] epollEventLoopGroup-3-4 org.pytorch.serve.http.HttpRequestHandler -
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
2023-03-27T07:53:29,584 [INFO ] pool-2-thread-2 ACCESS_LOG - /LOADBLANCER_IP:55604 "GET /ping HTTP/1.0" 200 0
2023-03-27T07:53:29,584 [INFO ] pool-2-thread-2 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:29,586 [ERROR] epollEventLoopGroup-3-5 org.pytorch.serve.http.HttpRequestHandler -
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
2023-03-27T07:53:31,594 [INFO ] pool-2-thread-2 ACCESS_LOG - /LOADBLANCER_IP:55608 "GET /ping HTTP/1.0" 200 0
2023-03-27T07:53:31,594 [INFO ] pool-2-thread-2 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:31,596 [ERROR] epollEventLoopGroup-3-6 org.pytorch.serve.http.HttpRequestHandler -
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
2023-03-27T07:53:33,604 [INFO ] pool-2-thread-2 ACCESS_LOG - /LOADBLANCER_IP:55612 "GET /ping HTTP/1.0" 200 0
2023-03-27T07:53:33,604 [INFO ] pool-2-thread-2 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:efa77cf4af16,timestamp:1679903601
2023-03-27T07:53:33,606 [ERROR] epollEventLoopGroup-3-7 org.pytorch.serve.http.HttpRequestHandler -
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
Where LOADBLANCER_IP is the anonymized ip of the loadbalancer host
Installation instructions
I am using Docker with docker-compose and a custom image for running mmdetection object detection models (according to the official docs).
In the following, the code I defined for:
- Dockerfile with
entrypoint.shandconfig.properties - docker-compose.yml
The Dockerfile is defined as follows.
FROM python:3.9-slim-buster
ARG PYTORCH="1.13.1"
ARG TORCHVISION="0.14.1"
ARG TORCHAUDIO="0.13.1"
RUN pip install torch==${PYTORCH}+cpu torchvision==${TORCHVISION}+cpu torchaudio==${TORCHAUDIO}+cpu --extra-index-url https://download.pytorch.org/whl/cpu
ARG MMCV="1.7.0"
ARG MMDET="2.28.1"
ENV PYTHONUNBUFFERED TRUE
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
ca-certificates \
g++ \
openjdk-11-jre-headless \
# MMDet Requirements
ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \
&& rm -rf /var/lib/apt/lists/*
ENV PATH="/opt/conda/bin:$PATH"
RUN export FORCE_CUDA=1
# TORCHSEVER
RUN pip install torchserve torch-model-archiver
# MMLAB
RUN ["/bin/bash", "-c", "pip install mmcv-full==${MMCV} -f https://download.openmmlab.com/mmcv/dist/cpu/torch${PYTORCH}/index.html"]
RUN pip install mmdet==${MMDET}
RUN useradd -m model-server \
&& mkdir -p /home/model-server/tmp
COPY entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod +x /usr/local/bin/entrypoint.sh \
&& chown -R model-server /home/model-server
COPY config.properties /home/model-server/config.properties
RUN mkdir /home/model-server/model-store && chown -R model-server /home/model-server/model-store
EXPOSE 8080 8081 8082
USER model-server
WORKDIR /home/model-server
ENV TEMP=/home/model-server/tmp
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["serve"]
The config.properties is available here
The entrypoint.sh is available here
To build the image:
docker build --pull -t mmdet-torchserve-cpu:2.28.1
Finally, the docker-compose.yml is defined as follows
version: '3.8'
services:
torchserve:
image: 'mmdet-torchserve-cpu:2.28.1'
ports:
- '8080:8080'
- '8081:8081'
- '8082:8082'
container_name: 'torchserve'
volumes:
- '/home/torchserve/model-store:/home/model-server/model-store'
command:
- 'torchserve --start'
- '--ncs'
- '--model-store model-store'
- '--models yoloxx-coco=yoloxx-coco.mar'
- '--ts-config model-store/config.properties'
networks:
- torchserve_net
networks:
torchserve_net:
Model Packaging
The defined handler.py is based on the one proposed by OpenMMLab. We just changed the input/output and add some basic error propagation.
# Copyright (c) OpenMMLab. All rights reserved.
import base64
import os
import mmcv
import torch
from ts.torch_handler.base_handler import BaseHandler
from mmdet.apis import inference_detector, init_detector
from ts.utils.util import PredictionException
import time
class MMdetHandler(BaseHandler):
threshold = 0.4
def initialize(self, context):
properties = context.system_properties
self.map_location = 'cuda' if torch.cuda.is_available() else 'cpu'
self.device = torch.device(self.map_location + ':' +
str(properties.get('gpu_id')) if torch.cuda.
is_available() else self.map_location)
self.manifest = context.manifest
model_dir = properties.get('model_dir')
serialized_file = self.manifest['model']['serializedFile']
checkpoint = os.path.join(model_dir, serialized_file)
self.config_file = os.path.join(model_dir, 'config.py')
self.model = init_detector(self.config_file, checkpoint, self.device)
self.initialized = True
def preprocess(self, data):
images_batches = []
for req in data:
images_batch=[]
data_loaded = req.get("data") if req.get("data") is not None else req.get("body", {})
if len(data_loaded['instances'])<1:
raise ValueError("empty instances list")
for image in data_loaded['instances']:
image = base64.urlsafe_b64decode(image)
image = mmcv.imfrombytes(image)
images_batch.append(image)
images_batches.append(images_batch)
return images_batches
def inference(self, images_batches, *args, **kwargs):
model_results=[]
shapes=[]
for image_batch in images_batches:
model_results.append(inference_detector(self.model, image_batch))
shapes.append([image.shape for image in image_batch])
results=(model_results, shapes)
return results
def postprocess(self, results):
output_batches = []
model_results, shapes = results
for i in range(len(model_results)):
post_processed_batch=[]
for j in range(len(model_results[i])):
image_result=model_results[i][j]
shape=shapes[i][j]
h, w, c= shape
image_result_clean=[]
if isinstance(image_result, tuple):
bbox_result, segm_result = image_result
if isinstance(segm_result, tuple):
segm_result = segm_result[0] # ms rcnn
else:
bbox_result, segm_result = image_result, None
for class_index, class_result in enumerate(bbox_result):
class_name = self.model.CLASSES[class_index]
for bbox in class_result:
bbox_coords = bbox[:-1].tolist()
y1,x1, y2,x2 =bbox_coords
relative_bbox_coords = y1/h, x1/w, y2/h, x2/w
score = float(bbox[-1])
if score >= self.threshold:
image_result_clean.append({
'class_name': class_name,
'bbox': bbox_coords,
'relative_bbox': relative_bbox_coords,
'score': score
})
post_processed_batch.append({'predictions': image_result_clean,
'img_shape': shape})
output_batches.append(post_processed_batch)
return output_batches
def handle(self, data, context):
"""Entry point for default handler. It takes the data from the input request and returns
the predicted outcome for the input.
Args:
data (list): The input data that needs to be made a prediction request on.
context (Context): It is a JSON Object containing information pertaining to
the model artefacts parameters.
Returns:
list : Returns a list of dictionary with the predicted response.
"""
# It can be used for pre or post processing if needed as additional request
# information is available in context
start_time = time.time()
self.context = context
metrics = self.context.metrics
is_profiler_enabled = os.environ.get("ENABLE_TORCH_PROFILER", None)
if is_profiler_enabled:
if PROFILER_AVAILABLE:
output, _ = self._infer_with_profiler(data=data)
else:
raise RuntimeError(
"Profiler is enabled but current version of torch does not support."
"Install torch>=1.8.1 to use profiler."
)
else:
try:
data_preprocess = self.preprocess(data)
except Exception as e:
raise PredictionException(
f"{type(e)} Error during data preprocessing: {str(e)}",
400)
try:
output = self.inference(data_preprocess)
except Exception as e:
raise PredictionException(
f"{type(e)} Error during inference: {str(e)}",
503)
try:
output = self.postprocess(output)
except Exception as e:
raise PredictionException(
f"{type(e)} Error during data postprocessing: {str(e)}",
503)
stop_time = time.time()
metrics.add_time(
"HandlerTime", round((stop_time - start_time) * 1000, 2), None, "ms"
)
return output
To package the model as .mar, please refer to the following mmdet2torchserve.py file and the official doc of mmdet
config.properties
The config.properties is defined for the specific model and it's defined in the /home/model-server/model-store/config.properties on the container.
We don't override the default config.properties file defined in /home/model-server/config.properties.
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
model_store=/home/model-server/model-store
workflow_store=/home/model-server/wf-store
async_logging=true
default_response_timeout=120
enable_metrics_api=true
max_request_size = 6553500
max_response_size = 6553500
models={\
"yoloxx-coco": {\
"1.0": {\
"defaultVersion": true,\
"marName": "yoloxx-coco.mar",\
"minWorkers": 1,\
"maxWorkers": 10,\
"batchSize": 10,\
"maxBatchDelay": 100,\
"responseTimeout": 100\
}\
}\
}
Versions
The python serve/ts_scripts/print_env_info.py doesn't work in the docker container
The torch* libs are
torch==1.13.1+cpu
torch-model-archiver==0.7.1
torchaudio==0.13.1+cpu
torchserve==0.7.1
torchvision==0.14.1+cpu
The installed pip libs are the following ones.
model-server@efa77cf4af16:~$ pip freeze
addict==2.4.0
certifi==2022.12.7
charset-normalizer==3.1.0
contourpy==1.0.7
cycler==0.11.0
enum-compat==0.0.3
fonttools==4.39.1
idna==3.4
importlib-resources==5.12.0
kiwisolver==1.4.4
matplotlib==3.7.1
mmcv-full==1.7.0
mmdet==2.28.1
numpy==1.24.2
opencv-python==4.7.0.72
packaging==23.0
Pillow==9.4.0
psutil==5.9.4
pycocotools==2.0.6
pyparsing==3.0.9
python-dateutil==2.8.2
PyYAML==6.0
requests==2.28.2
scipy==1.10.1
six==1.16.0
terminaltables==3.1.10
torch==1.13.1+cpu
torch-model-archiver==0.7.1
torchaudio==0.13.1+cpu
torchserve==0.7.1
torchvision==0.14.1+cpu
typing_extensions==4.5.0
urllib3==1.26.15
yapf==0.32.0
zipp==3.15.0
Repro instructions
- create an infrastructure with a load balancer which defines an health check on the route
GET $HOSTNAME:8080/pingwhere$HOSTNAMEis the host where TorchServe is running under docker-compose - create the TorchServe container
docker-compose up -d --buildto run the docker-compose f - run the command
docker logs -f torchserveto see the error
Possible Solution
No response