Unable to use pytoch library with libtorch backend when using triton inference server In-Process python API
#7,222 opened on May 15, 2024
Repository metrics
- Stars
- (6,593 stars)
- PR merge metrics
- (Avg merge 2d 16h) (34 merged PRs in 30d)
Description
Description A clear and concise description of what the bug is. I am trying to use the newly introduced triton inference server In-Process python API to serve pytorch models using the libtorch backend. I am using pytorch and torchvision libraries to do some pre and post processing of the input data before sending it to the triton server for prediction. But when I try to use pytorch or torchvision i am getting the follwing error.
failed to load 'cifar10' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Triton Server logs:
I0515 09:22:40.092038 265 cache_manager.cc:480] Create CacheManager with cache_dir: '/opt/tritonserver/caches'
W0515 09:22:40.092110 265 pinned_memory_manager.cc:271] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0515 09:22:40.092129 265 cuda_memory_manager.cc:117] CUDA memory pool disabled
E0515 09:22:40.092267 265 server.cc:243] CudaDriverHelper has not been initialized.
I0515 09:22:40.093620 265 model_config_utils.cc:680] Server side auto-completed config: name: "cifar10"
platform: "pytorch_libtorch"
max_batch_size: 1
input {
name: "INPUT__0"
data_type: TYPE_FP32
dims: 3
dims: 32
dims: 32
}
output {
name: "OUTPUT__0"
data_type: TYPE_FP32
dims: 10
}
default_model_filename: "model.pt"
backend: "pytorch"
I0515 09:22:40.093699 265 model_lifecycle.cc:469] loading: cifar10:1
I0515 09:22:40.093820 265 backend_model.cc:502] Adding default backend config setting: default-max-batch-size,4
I0515 09:22:40.093847 265 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
I0515 09:22:40.098713 265 backend_manager.cc:138] unloading backend 'pytorch'
E0515 09:22:40.098758 265 model_lifecycle.cc:638] failed to load 'cifar10' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I0515 09:22:40.098775 265 model_lifecycle.cc:773] failed to load 'cifar10'
I0515 09:22:40.098860 265 server.cc:607]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0515 09:22:40.098880 265 server.cc:634]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+
I0515 09:22:40.098907 265 server.cc:677]
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| cifar10 | 1 | UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKc |
| | | S2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE |
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0515 09:22:40.099027 265 metrics.cc:770] Collecting CPU metrics
I0515 09:22:40.099151 265 tritonserver.cc:2538]
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.45.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memo |
| | ry binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | models_dir |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
I0515 09:22:40.099172 265 server.cc:307] Waiting for in-flight requests to complete.
I0515 09:22:40.099176 265 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0515 09:22:40.099204 265 server.cc:338] All models are stopped, unloading models
I0515 09:22:40.099210 265 server.cc:347] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
Triton Information What version of Triton are you using?
$ pip show tritonserver
Name: tritonserver
Version: 2.45.0
Summary: Triton Inference Server In-Process Python API
Home-page: https://developer.nvidia.com/nvidia-triton-inference-server
Author: NVIDIA Inc.
Author-email: sw-dl-triton@nvidia.com
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy
Required-by:
$ pip show torch
Name: torch
Version: 2.3.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: torchvision
$ pip show torchvision
Name: torchvision
Version: 0.18.0
Summary: image and video datasets and models for torch deep learning
Home-page: https://github.com/pytorch/vision
Author: PyTorch Core Team
Author-email: soumith@pytorch.org
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy, pillow, torch
Required-by:
Are you using the Triton container or did you build it yourself?
I am using nvcr.io/nvidia/tritonserver:24.04-py3 container to serve the model using in-process python API.
To Reproduce Steps to reproduce the behavior. A simple script to reproduce the error.
import time
import tritonserver
from torchvision import transforms # importing this leads to errors
import torch # importing this leads to errors
def start():
server = tritonserver.Server(model_repository="python/models",
log_error=True,
log_info=True,
log_verbose=True,
)
print("tritonserver version : ", tritonserver.__version__)
server.start()
print("server started")
model = server.model("cifar10")
if __name__ == "__main__":
start()
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
name: "cifar10"
platform: "pytorch_libtorch"
max_batch_size: 1
input [
{
name: "INPUT__0"
data_type: TYPE_FP32
dims: [3,32,32]
}
]
output [
{
name: "OUTPUT__0"
data_type: TYPE_FP32
dims: [10]
}
]
Expected behavior A clear and concise description of what you expected to happen. Pytorch and torchvision should work with tritonserver in-process python API