triton-inference-server/server

Unable to use pytoch library with libtorch backend when using triton inference server In-Process python API

Open

#7222 aperta il 15 mag 2024

Vedi su GitHub
 (10 commenti) (1 reazione) (1 assegnatario)Python (1304 fork)batch import
help wantedquestion

Metriche repository

Star
 (6593 star)
Metriche merge PR
 (Merge medio 2g 16h) (34 PR mergiate in 30 g)

Descrizione

Description A clear and concise description of what the bug is. I am trying to use the newly introduced triton inference server In-Process python API to serve pytorch models using the libtorch backend. I am using pytorch and torchvision libraries to do some pre and post processing of the input data before sending it to the triton server for prediction. But when I try to use pytorch or torchvision i am getting the follwing error.

failed to load 'cifar10' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Triton Server logs:

I0515 09:22:40.092038 265 cache_manager.cc:480] Create CacheManager with cache_dir: '/opt/tritonserver/caches'
W0515 09:22:40.092110 265 pinned_memory_manager.cc:271] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0515 09:22:40.092129 265 cuda_memory_manager.cc:117] CUDA memory pool disabled
E0515 09:22:40.092267 265 server.cc:243] CudaDriverHelper has not been initialized.
I0515 09:22:40.093620 265 model_config_utils.cc:680] Server side auto-completed config: name: "cifar10"
platform: "pytorch_libtorch"
max_batch_size: 1
input {
  name: "INPUT__0"
  data_type: TYPE_FP32
  dims: 3
  dims: 32
  dims: 32
}
output {
  name: "OUTPUT__0"
  data_type: TYPE_FP32
  dims: 10
}
default_model_filename: "model.pt"
backend: "pytorch"

I0515 09:22:40.093699 265 model_lifecycle.cc:469] loading: cifar10:1
I0515 09:22:40.093820 265 backend_model.cc:502] Adding default backend config setting: default-max-batch-size,4
I0515 09:22:40.093847 265 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
I0515 09:22:40.098713 265 backend_manager.cc:138] unloading backend 'pytorch'
E0515 09:22:40.098758 265 model_lifecycle.cc:638] failed to load 'cifar10' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I0515 09:22:40.098775 265 model_lifecycle.cc:773] failed to load 'cifar10'
I0515 09:22:40.098860 265 server.cc:607] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0515 09:22:40.098880 265 server.cc:634] 
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0515 09:22:40.098907 265 server.cc:677] 
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model   | Version | Status                                                                                                                                                                 |
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| cifar10 | 1       | UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKc |
|         |         | S2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE                                                                                                             |
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0515 09:22:40.099027 265 metrics.cc:770] Collecting CPU metrics
I0515 09:22:40.099151 265 tritonserver.cc:2538] 
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                  |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                 |
| server_version                   | 2.45.0                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memo |
|                                  | ry binary_tensor_data parameters statistics trace logging                                                                                              |
| model_repository_path[0]         | models_dir                                                                                                                                             |
| model_control_mode               | MODE_NONE                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                      |
| rate_limit                       | OFF                                                                                                                                                    |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                              |
| min_supported_compute_capability | 6.0                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                     |
| cache_enabled                    | 0                                                                                                                                                      |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+

I0515 09:22:40.099172 265 server.cc:307] Waiting for in-flight requests to complete.
I0515 09:22:40.099176 265 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0515 09:22:40.099204 265 server.cc:338] All models are stopped, unloading models
I0515 09:22:40.099210 265 server.cc:347] Timeout 30: Found 0 live models and 0 in-flight non-inference requests

Triton Information What version of Triton are you using?

$ pip show tritonserver

Name: tritonserver
Version: 2.45.0
Summary: Triton Inference Server In-Process Python API
Home-page: https://developer.nvidia.com/nvidia-triton-inference-server
Author: NVIDIA Inc.
Author-email: sw-dl-triton@nvidia.com
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy
Required-by: 
$ pip show torch
Name: torch
Version: 2.3.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: torchvision
$ pip show torchvision
Name: torchvision
Version: 0.18.0
Summary: image and video datasets and models for torch deep learning
Home-page: https://github.com/pytorch/vision
Author: PyTorch Core Team
Author-email: soumith@pytorch.org
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy, pillow, torch
Required-by: 

Are you using the Triton container or did you build it yourself? I am using nvcr.io/nvidia/tritonserver:24.04-py3 container to serve the model using in-process python API.

To Reproduce Steps to reproduce the behavior. A simple script to reproduce the error.

import time
import tritonserver
from torchvision import transforms  # importing this leads to errors
import torch  # importing this leads to errors


def start():
    server = tritonserver.Server(model_repository="python/models",
                                 log_error=True,
                                 log_info=True,
                                 log_verbose=True,
                                 )
    print("tritonserver version : ", tritonserver.__version__)
    server.start()
    print("server started")
    model = server.model("cifar10")


if __name__ == "__main__":
    start()

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

name: "cifar10"
platform: "pytorch_libtorch"
max_batch_size: 1
input [
  {
    name: "INPUT__0"
    data_type: TYPE_FP32
    dims: [3,32,32]
  }
]
output [
  {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [10]
  }
]

Expected behavior A clear and concise description of what you expected to happen. Pytorch and torchvision should work with tritonserver in-process python API

Guida contributor