worker restart and logs “epollEventLoopGroup-3-10 org.pytorch.serve.http.HttpRequestHandler”
#756 opened on Oct 28, 2020
Description
Your issue may already be reported! Please search on the issue tracker before creating one.
Context
- torchserve version:TorchServe Version is 0.2.0
- torch version:1.6.0
- torch-model-archiver:0.2.0
- torchvision version [if any]:
- torchtext version [if any]:
- torchaudio version [if any]:
- java version:openjdk version "11.0.8" 2020-07-14
- Operating System and version:Linux version 3.10.0-1062.9.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) )
Your Environment
-
Installed using source? [yes/no]:no
-
Are you planning to deploy it using docker container? [yes/no]:yes
-
Is it a CPU or GPU environment?: GPU
-
Using a default/custom handler? [If possible upload/share custom handler/model]: custom-model:https://drive.google.com/file/d/1kHwvoyZYPgro4FLvlWDYjUqSqjE0MiQp/view?usp=sharing
-
What kind of model is it e.g. vision, text, audio?:vision
-
Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]:
-
Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs: 【config.properties】: inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 NUM_WORKERS=1 number_of_gpu=1 number_of_netty_threads=32 job_queue_size=1000 model_snapshot={"name":"startup.cfg","modelCount": 1,"models":{"SpamCls":{"1.0":{"defaultVersion": true, "marName": "SpamCls.mar", "minWorkers": 2, "maxWorkers": 2,"batchSize": 30, "maxBatchDelay": 50, "responseTimeout": 120}}}} 【logs】 https://drive.google.com/file/d/1tWLgLzunMos8LwtT_BkNh85npD6c7DNp/view?usp=sharing
-
Link to your project [if any]:
Expected Behavior
Current Behavior
One of the workers restarts at intervals.

Possible Solution
Steps to Reproduce
...
Failure Logs [if any]
2020-10-28 04:47:31,640 [INFO ] W-9000-SpamCls_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 41 2020-10-28 04:47:31,640 [DEBUG] W-9000-SpamCls_1.0 org.pytorch.serve.wlm.Job - Waiting time ns: 49196663, Backend time ns: 41236071 2020-10-28 04:47:31,640 [DEBUG] W-9000-SpamCls_1.0 org.pytorch.serve.wlm.Job - Waiting time ns: 40122864, Backend time ns: 41284987 2020-10-28 04:47:31,805 [ERROR] epollEventLoopGroup-3-9 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:32,226 [ERROR] epollEventLoopGroup-3-10 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:32,730 [ERROR] epollEventLoopGroup-3-11 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:32,874 [ERROR] epollEventLoopGroup-3-12 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:33,014 [ERROR] epollEventLoopGroup-3-13 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:33,094 [ERROR] epollEventLoopGroup-3-14 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:33,104 [ERROR] epollEventLoopGroup-3-15 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer