PaddlePaddle/PaddleX

windows c++ tensorrt paddlex 部署推理

Open

#4799 opened on Dec 3, 2025

View on GitHub
 (2 comments) (0 reactions) (1 assignee)Python (4,520 stars) (894 forks)batch import
Windowshelp wanted

Description

描述问题

我通过paddlex制作了tensorrt_infer.dll,来进行推理,并采用了动态输入,成功进行了推理。 但是问题随之而来:

  1. 没有生成trt缓存,每次运行都会重新 Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. 如此一来,可用性大幅度降低,想请问该如何保存trt模型,或者提取生成。

  2. 另外我有看见 TensorRTEngineConfig,请问我该如何直接使用此结构体,来构建新的tensorrt_infer.cpp.或许可以避开上述的麻烦。 我有看见TensorRTEngineConfig结构体中有这一项 // onnx model path std::string model_file_ = ""; 所以我在cmake的时候,勾选了 WITH_ONNX_TENSORRT,但没观察到明显的变化。 cmake勾选图如下:

  3. 我在动态部署时,通过其它issue的提示,所生成的pbtxt文件,获取了动态部署的尺寸,但想请问能否直接使用此.pbtxt文件,否则或许更换模型框架后,就需要重新制作一次tensorrt_infer.dll,

复现

  1. 高性能推理

  2. 您使用的模型数据集是? 我目前使用的模型是pplite, backbone=stdc1 数据集为自定义生成的数据集。

  3. 请提供您出现的报错信息及相关log 运行时相关的输出如下: REGISTER_CLASS:seg init SegModel,model_type=seg WARNING: Logging before InitGoogleLogging() is written to STDERR I1203 16:26:10.320520 18876 analysis_predictor.cc:1532] TensorRT subgraph engine is enabled e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m I1203 16:26:10.323510 18876 executor.cc:187] Old Executor is Running. e[1me[35m--- Running analysis [ir_analysis_pass]e[0m e[32m--- Running IR pass [trt_remove_amp_strategy_op_pass]e[0m e[32m--- Running IR pass [trt_support_nhwc_pass]e[0m e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m I1203 16:26:10.350419 18876 fuse_pass_base.cc:59] --- detected 1 subgraphs e[32m--- Running IR pass [trt_map_ops_to_matrix_multiply_pass]e[0m e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m e[32m--- Running IR pass [trt_delete_weight_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [identity_op_clean_pass]e[0m I1203 16:26:10.371353 18876 fuse_pass_base.cc:59] --- detected 1 subgraphs e[32m--- Running IR pass [add_support_int8_pass]e[0m I1203 16:26:10.399904 18876 fuse_pass_base.cc:59] --- detected 209 subgraphs e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m e[32m--- Running IR pass [trt_prompt_tuning_embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [trt_embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [trt_multihead_matmul_fuse_pass_v2]e[0m e[32m--- Running IR pass [trt_multihead_matmul_fuse_pass_v3]e[0m e[32m--- Running IR pass [multihead_matmul_roformer_fuse_pass]e[0m e[32m--- Running IR pass [constant_folding_pass]e[0m I1203 16:26:10.455720 18876 fuse_pass_base.cc:59] --- detected 4 subgraphs e[32m--- Running IR pass [trt_flash_multihead_matmul_fuse_pass]e[0m e[32m--- Running IR pass [trt_cross_multihead_matmul_fuse_pass]e[0m e[32m--- Running IR pass [vit_attention_fuse_pass]e[0m e[32m--- Running IR pass [trt_qk_multihead_matmul_fuse_pass]e[0m e[32m--- Running IR pass [layernorm_shift_partition_fuse_pass]e[0m e[32m--- Running IR pass [merge_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [preln_residual_bias_fuse_pass]e[0m e[32m--- Running IR pass [preln_layernorm_x_fuse_pass]e[0m e[32m--- Running IR pass [reverse_roll_fuse_pass]e[0m e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m I1203 16:26:10.524490 18876 fuse_pass_base.cc:59] --- detected 39 subgraphs e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed! W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed. W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed! W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed. W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed! W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed. W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed! W1203 16:26:10.535450 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed. I1203 16:26:10.539438 18876 fuse_pass_base.cc:59] --- detected 39 subgraphs e[32m--- Running IR pass [remove_padding_recover_padding_pass]e[0m e[32m--- Running IR pass [delete_remove_padding_recover_padding_pass]e[0m e[32m--- Running IR pass [dense_fc_to_sparse_pass]e[0m e[32m--- Running IR pass [dense_multihead_matmul_to_sparse_pass]e[0m e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m I1203 16:26:10.551398 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 13 nodes I1203 16:26:10.592264 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W1203 16:26:12.048230 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:12.048230 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:12.049224 18876 place.cc:161] The paddle::PlaceType::kCPU/kGPU is deprecated since version 2.3, and will be removed in version 2.4! Please use Tensor::is_cpu()/is_gpu() method to determine the type of place. I1203 16:26:12.050225 18876 engine.cc:215] Run Paddle-TRT FP16 mode I1203 16:26:12.050225 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode. W1203 16:26:40.895160 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy. W1203 16:26:40.895160 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. W1203 16:26:40.895160 18876 helper.h:127] Check verbose logs for the list of affected weights. W1203 16:26:40.895160 18876 helper.h:127] - 1 weights are affected by this issue: Detected subnormal FP16 values. I1203 16:26:40.898150 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 13 nodes I1203 16:26:40.898150 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W1203 16:26:40.899148 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:40.899148 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. I1203 16:26:40.899148 18876 engine.cc:215] Run Paddle-TRT FP16 mode I1203 16:26:40.899148 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode. W1203 16:26:46.100383 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy. W1203 16:26:46.100383 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. W1203 16:26:46.101380 18876 helper.h:127] Check verbose logs for the list of affected weights. W1203 16:26:46.101380 18876 helper.h:127] - 1 weights are affected by this issue: Detected subnormal FP16 values. W1203 16:26:46.101380 18876 helper.h:127] - 1 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value. I1203 16:26:46.103374 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 16 nodes I1203 16:26:46.104367 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W1203 16:26:46.104367 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:46.104367 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. I1203 16:26:46.105363 18876 engine.cc:215] Run Paddle-TRT FP16 mode I1203 16:26:46.105363 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode. W1203 16:26:53.203140 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy. W1203 16:26:53.203140 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. W1203 16:26:53.204133 18876 helper.h:127] Check verbose logs for the list of affected weights. W1203 16:26:53.204133 18876 helper.h:127] - 2 weights are affected by this issue: Detected subnormal FP16 values. I1203 16:26:53.206130 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 106 nodes I1203 16:26:53.210116 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W1203 16:26:53.212109 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:53.212109 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:53.215099 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:53.216091 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:53.226063 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:53.226063 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. I1203 16:26:53.231046 18876 engine.cc:215] Run Paddle-TRT FP16 mode I1203 16:26:53.231046 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode. W1203 16:27:51.221599 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy. W1203 16:27:51.221599 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. W1203 16:27:51.222596 18876 helper.h:127] Check verbose logs for the list of affected weights. W1203 16:27:51.222596 18876 helper.h:127] - 35 weights are affected by this issue: Detected subnormal FP16 values. W1203 16:27:51.222596 18876 helper.h:127] - 6 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value. e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m e[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass]e[0m e[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass]e[0m e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m e[32m--- Running IR pass [auto_mixed_precision_pass]e[0m e[1me[35m--- Running analysis [save_optimized_model_pass]e[0m e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m I1203 16:27:51.243525 18876 ir_params_sync_among_devices_pass.cc:53] Sync params from CPU to GPU e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m e[1me[35m--- Running analysis [memory_optimize_pass]e[0m I1203 16:27:51.244522 18876 memory_optimize_pass.cc:118] The persistable params in main graph are : 30.6206MB I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : bilinear_interp_v2_0.tmp_0 size: 512 I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : batch_norm_32.tmp_1 size: 4 I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : shape_0.tmp_0_slice_2 size: 4 I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : bilinear_interp_v2_1.tmp_0 size: 512 I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : tmp_0 size: 512 I1203 16:27:51.246515 18876 memory_optimize_pass.cc:246] Cluster name : shape_0.tmp_0_slice_1 size: 4 e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m I1203 16:27:51.259475 18876 analysis_predictor.cc:1838] ======= optimize end ======= I1203 16:27:51.260469 18876 naive_executor.cc:200] --- skip [feed], feed -> x I1203 16:27:51.260469 18876 naive_executor.cc:200] --- skip [save_infer_model/scale_0.tmp_0], fetch -> fetch W1203 16:27:51.267448 18876 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 13.0, Runtime API Version: 11.8 W1203 16:27:51.268443 18876 gpu_resources.cc:164] device: 0, cuDNN Version: 8.6. [TRT] 批量推理完成,有效结果数:976/976

===================================================== TRT 批量推理耗时统计

总有效图片数量: 976 张 总推理耗时: 1782.967 毫秒 平均单图推理耗时: 1.827 毫秒 随机选中图片索引: 935 (有效结果)

环境

  1. 请提供您使用的PaddlePaddle、PaddleX版本号、Python版本号 paddle2onnx 1.3.1 paddleocr 2.10.0 paddlepaddle 2.6.2 paddlepaddle-gpu 2.6.1 paddleseg 0.0.0.dev0 实际为 2.10,使用 了本地部署 python3.10版本在anaconda3中训练的模型,随后在Windows环境下进行c++、tensorrt部署。

  2. 请提供您使用的操作系统信息,如Linux/Windows/MacOS Windows10

  3. 请问您使用的CUDA/cuDNN的版本号是? Cuda 版本 11.8 Cudnn 版本 8.6.0.163 tensorrt 版本8.5.1.7

以上为提供的详细信息,请问需要另外提供什么信息吗?

Contributor guide