pytorch/serve

worker restart and logs “epollEventLoopGroup-3-10 org.pytorch.serve.http.HttpRequestHandler”

Open

#756 opened on Oct 28, 2020

View on GitHub
 (13 comments) (4 reactions) (1 assignee)Java (3,844 stars) (790 forks)batch import
bughelp wanted

Description

Your issue may already be reported! Please search on the issue tracker before creating one.

Context

  • torchserve version:TorchServe Version is 0.2.0
  • torch version:1.6.0
  • torch-model-archiver:0.2.0
  • torchvision version [if any]:
  • torchtext version [if any]:
  • torchaudio version [if any]:
  • java version:openjdk version "11.0.8" 2020-07-14
  • Operating System and version:Linux version 3.10.0-1062.9.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) )

Your Environment

  • Installed using source? [yes/no]:no

  • Are you planning to deploy it using docker container? [yes/no]:yes

  • Is it a CPU or GPU environment?: GPU

  • Using a default/custom handler? [If possible upload/share custom handler/model]: custom-model:https://drive.google.com/file/d/1kHwvoyZYPgro4FLvlWDYjUqSqjE0MiQp/view?usp=sharing

  • What kind of model is it e.g. vision, text, audio?:vision

  • Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]:

  • Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs: 【config.properties】: inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 NUM_WORKERS=1 number_of_gpu=1 number_of_netty_threads=32 job_queue_size=1000 model_snapshot={"name":"startup.cfg","modelCount": 1,"models":{"SpamCls":{"1.0":{"defaultVersion": true, "marName": "SpamCls.mar", "minWorkers": 2, "maxWorkers": 2,"batchSize": 30, "maxBatchDelay": 50, "responseTimeout": 120}}}} 【logs】 https://drive.google.com/file/d/1tWLgLzunMos8LwtT_BkNh85npD6c7DNp/view?usp=sharing

  • Link to your project [if any]:

Expected Behavior

Current Behavior

One of the workers restarts at intervals. image

Possible Solution

Steps to Reproduce

...

Failure Logs [if any]

2020-10-28 04:47:31,640 [INFO ] W-9000-SpamCls_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 41 2020-10-28 04:47:31,640 [DEBUG] W-9000-SpamCls_1.0 org.pytorch.serve.wlm.Job - Waiting time ns: 49196663, Backend time ns: 41236071 2020-10-28 04:47:31,640 [DEBUG] W-9000-SpamCls_1.0 org.pytorch.serve.wlm.Job - Waiting time ns: 40122864, Backend time ns: 41284987 2020-10-28 04:47:31,805 [ERROR] epollEventLoopGroup-3-9 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:32,226 [ERROR] epollEventLoopGroup-3-10 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:32,730 [ERROR] epollEventLoopGroup-3-11 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:32,874 [ERROR] epollEventLoopGroup-3-12 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:33,014 [ERROR] epollEventLoopGroup-3-13 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:33,094 [ERROR] epollEventLoopGroup-3-14 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer 2020-10-28 04:47:33,104 [ERROR] epollEventLoopGroup-3-15 org.pytorch.serve.http.HttpRequestHandler - io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

Contributor guide