triton-inference-server/server

Unable to use pytoch library with libtorch backend when using triton inference server In-Process python API

Open

#7,222 建立於 2024年5月15日

在 GitHub 查看
 (10 留言) (1 反應) (1 負責人)Python (1,304 fork)batch import
help wantedquestion

倉庫指標

Star
 (6,593 star)
PR 合併指標
 (平均合併 2天 16小時) (30 天內合併 34 個 PR)

描述

Description A clear and concise description of what the bug is. I am trying to use the newly introduced triton inference server In-Process python API to serve pytorch models using the libtorch backend. I am using pytorch and torchvision libraries to do some pre and post processing of the input data before sending it to the triton server for prediction. But when I try to use pytorch or torchvision i am getting the follwing error.

failed to load 'cifar10' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Triton Server logs:

I0515 09:22:40.092038 265 cache_manager.cc:480] Create CacheManager with cache_dir: '/opt/tritonserver/caches'
W0515 09:22:40.092110 265 pinned_memory_manager.cc:271] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0515 09:22:40.092129 265 cuda_memory_manager.cc:117] CUDA memory pool disabled
E0515 09:22:40.092267 265 server.cc:243] CudaDriverHelper has not been initialized.
I0515 09:22:40.093620 265 model_config_utils.cc:680] Server side auto-completed config: name: "cifar10"
platform: "pytorch_libtorch"
max_batch_size: 1
input {
  name: "INPUT__0"
  data_type: TYPE_FP32
  dims: 3
  dims: 32
  dims: 32
}
output {
  name: "OUTPUT__0"
  data_type: TYPE_FP32
  dims: 10
}
default_model_filename: "model.pt"
backend: "pytorch"

I0515 09:22:40.093699 265 model_lifecycle.cc:469] loading: cifar10:1
I0515 09:22:40.093820 265 backend_model.cc:502] Adding default backend config setting: default-max-batch-size,4
I0515 09:22:40.093847 265 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
I0515 09:22:40.098713 265 backend_manager.cc:138] unloading backend 'pytorch'
E0515 09:22:40.098758 265 model_lifecycle.cc:638] failed to load 'cifar10' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I0515 09:22:40.098775 265 model_lifecycle.cc:773] failed to load 'cifar10'
I0515 09:22:40.098860 265 server.cc:607] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0515 09:22:40.098880 265 server.cc:634] 
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0515 09:22:40.098907 265 server.cc:677] 
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model   | Version | Status                                                                                                                                                                 |
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| cifar10 | 1       | UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKc |
|         |         | S2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE                                                                                                             |
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0515 09:22:40.099027 265 metrics.cc:770] Collecting CPU metrics
I0515 09:22:40.099151 265 tritonserver.cc:2538] 
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                  |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                 |
| server_version                   | 2.45.0                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memo |
|                                  | ry binary_tensor_data parameters statistics trace logging                                                                                              |
| model_repository_path[0]         | models_dir                                                                                                                                             |
| model_control_mode               | MODE_NONE                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                      |
| rate_limit                       | OFF                                                                                                                                                    |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                              |
| min_supported_compute_capability | 6.0                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                     |
| cache_enabled                    | 0                                                                                                                                                      |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+

I0515 09:22:40.099172 265 server.cc:307] Waiting for in-flight requests to complete.
I0515 09:22:40.099176 265 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0515 09:22:40.099204 265 server.cc:338] All models are stopped, unloading models
I0515 09:22:40.099210 265 server.cc:347] Timeout 30: Found 0 live models and 0 in-flight non-inference requests

Triton Information What version of Triton are you using?

$ pip show tritonserver

Name: tritonserver
Version: 2.45.0
Summary: Triton Inference Server In-Process Python API
Home-page: https://developer.nvidia.com/nvidia-triton-inference-server
Author: NVIDIA Inc.
Author-email: sw-dl-triton@nvidia.com
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy
Required-by: 
$ pip show torch
Name: torch
Version: 2.3.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: torchvision
$ pip show torchvision
Name: torchvision
Version: 0.18.0
Summary: image and video datasets and models for torch deep learning
Home-page: https://github.com/pytorch/vision
Author: PyTorch Core Team
Author-email: soumith@pytorch.org
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy, pillow, torch
Required-by: 

Are you using the Triton container or did you build it yourself? I am using nvcr.io/nvidia/tritonserver:24.04-py3 container to serve the model using in-process python API.

To Reproduce Steps to reproduce the behavior. A simple script to reproduce the error.

import time
import tritonserver
from torchvision import transforms  # importing this leads to errors
import torch  # importing this leads to errors


def start():
    server = tritonserver.Server(model_repository="python/models",
                                 log_error=True,
                                 log_info=True,
                                 log_verbose=True,
                                 )
    print("tritonserver version : ", tritonserver.__version__)
    server.start()
    print("server started")
    model = server.model("cifar10")


if __name__ == "__main__":
    start()

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

name: "cifar10"
platform: "pytorch_libtorch"
max_batch_size: 1
input [
  {
    name: "INPUT__0"
    data_type: TYPE_FP32
    dims: [3,32,32]
  }
]
output [
  {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [10]
  }
]

Expected behavior A clear and concise description of what you expected to happen. Pytorch and torchvision should work with tritonserver in-process python API

貢獻者指南