Repository metrics

Stars: (4,520 stars)
PR merge metrics: (Avg merge 1d 3h) (9 merged PRs in 30d)

Description

描述问题

我通过paddlex制作了tensorrt_infer.dll,来进行推理，并采用了动态输入，成功进行了推理。但是问题随之而来：

没有生成trt缓存，每次运行都会重新 Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. 如此一来，可用性大幅度降低，想请问该如何保存trt模型，或者提取生成。
另外我有看见 TensorRTEngineConfig，请问我该如何直接使用此结构体，来构建新的tensorrt_infer.cpp.或许可以避开上述的麻烦。我有看见TensorRTEngineConfig结构体中有这一项 // onnx model path std::string model_file_ = ""; 所以我在cmake的时候，勾选了 WITH_ONNX_TENSORRT,但没观察到明显的变化。 cmake勾选图如下：
我在动态部署时，通过其它issue的提示，所生成的pbtxt文件，获取了动态部署的尺寸，但想请问能否直接使用此.pbtxt文件，否则或许更换模型框架后，就需要重新制作一次tensorrt_infer.dll，

复现

高性能推理
- 您是否完全按照高性能推理文档教程跑通了流程？
您使用的模型和数据集是？我目前使用的模型是pplite， backbone=stdc1 数据集为自定义生成的数据集。
请提供您出现的报错信息及相关log 运行时相关的输出如下： REGISTER_CLASS:seg init SegModel,model_type=seg WARNING: Logging before InitGoogleLogging() is written to STDERR I1203 16:26:10.320520 18876 analysis_predictor.cc:1532] TensorRT subgraph engine is enabled e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m I1203 16:26:10.323510 18876 executor.cc:187] Old Executor is Running. e[1me[35m--- Running analysis [ir_analysis_pass]e[0m e[32m--- Running IR pass [trt_remove_amp_strategy_op_pass]e[0m e[32m--- Running IR pass [trt_support_nhwc_pass]e[0m e[32m--- Running IR pass [adaptive_pool2d_convert_global_pass]e[0m I1203 16:26:10.350419 18876 fuse_pass_base.cc:59] --- detected 1 subgraphs e[32m--- Running IR pass [trt_map_ops_to_matrix_multiply_pass]e[0m e[32m--- Running IR pass [shuffle_channel_detect_pass]e[0m e[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_op_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_filter_op_pass]e[0m e[32m--- Running IR pass [trt_delete_weight_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [delete_quant_dequant_linear_op_pass]e[0m e[32m--- Running IR pass [identity_op_clean_pass]e[0m I1203 16:26:10.371353 18876 fuse_pass_base.cc:59] --- detected 1 subgraphs e[32m--- Running IR pass [add_support_int8_pass]e[0m I1203 16:26:10.399904 18876 fuse_pass_base.cc:59] --- detected 209 subgraphs e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m e[32m--- Running IR pass [trt_prompt_tuning_embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [trt_embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [trt_multihead_matmul_fuse_pass_v2]e[0m e[32m--- Running IR pass [trt_multihead_matmul_fuse_pass_v3]e[0m e[32m--- Running IR pass [multihead_matmul_roformer_fuse_pass]e[0m e[32m--- Running IR pass [constant_folding_pass]e[0m I1203 16:26:10.455720 18876 fuse_pass_base.cc:59] --- detected 4 subgraphs e[32m--- Running IR pass [trt_flash_multihead_matmul_fuse_pass]e[0m e[32m--- Running IR pass [trt_cross_multihead_matmul_fuse_pass]e[0m e[32m--- Running IR pass [vit_attention_fuse_pass]e[0m e[32m--- Running IR pass [trt_qk_multihead_matmul_fuse_pass]e[0m e[32m--- Running IR pass [layernorm_shift_partition_fuse_pass]e[0m e[32m--- Running IR pass [merge_layernorm_fuse_pass]e[0m e[32m--- Running IR pass [preln_residual_bias_fuse_pass]e[0m e[32m--- Running IR pass [preln_layernorm_x_fuse_pass]e[0m e[32m--- Running IR pass [reverse_roll_fuse_pass]e[0m e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m I1203 16:26:10.524490 18876 fuse_pass_base.cc:59] --- detected 39 subgraphs e[32m--- Running IR pass [conv_elementwise_add_fuse_pass]e[0m W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed! W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed. W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed! W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed. W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed! W1203 16:26:10.534457 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed. W1203 16:26:10.534457 18876 op_compat_sensible_pass.cc:232] Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed! W1203 16:26:10.535450 18876 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed. I1203 16:26:10.539438 18876 fuse_pass_base.cc:59] --- detected 39 subgraphs e[32m--- Running IR pass [remove_padding_recover_padding_pass]e[0m e[32m--- Running IR pass [delete_remove_padding_recover_padding_pass]e[0m e[32m--- Running IR pass [dense_fc_to_sparse_pass]e[0m e[32m--- Running IR pass [dense_multihead_matmul_to_sparse_pass]e[0m e[32m--- Running IR pass [tensorrt_subgraph_pass]e[0m I1203 16:26:10.551398 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 13 nodes I1203 16:26:10.592264 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W1203 16:26:12.048230 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:12.048230 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:12.049224 18876 place.cc:161] The paddle::PlaceType::kCPU/kGPU is deprecated since version 2.3, and will be removed in version 2.4! Please use Tensor::is_cpu()/is_gpu() method to determine the type of place. I1203 16:26:12.050225 18876 engine.cc:215] Run Paddle-TRT FP16 mode I1203 16:26:12.050225 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode. W1203 16:26:40.895160 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy. W1203 16:26:40.895160 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. W1203 16:26:40.895160 18876 helper.h:127] Check verbose logs for the list of affected weights. W1203 16:26:40.895160 18876 helper.h:127] - 1 weights are affected by this issue: Detected subnormal FP16 values. I1203 16:26:40.898150 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 13 nodes I1203 16:26:40.898150 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W1203 16:26:40.899148 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:40.899148 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. I1203 16:26:40.899148 18876 engine.cc:215] Run Paddle-TRT FP16 mode I1203 16:26:40.899148 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode. W1203 16:26:46.100383 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy. W1203 16:26:46.100383 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. W1203 16:26:46.101380 18876 helper.h:127] Check verbose logs for the list of affected weights. W1203 16:26:46.101380 18876 helper.h:127] - 1 weights are affected by this issue: Detected subnormal FP16 values. W1203 16:26:46.101380 18876 helper.h:127] - 1 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value. I1203 16:26:46.103374 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 16 nodes I1203 16:26:46.104367 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W1203 16:26:46.104367 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:46.104367 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. I1203 16:26:46.105363 18876 engine.cc:215] Run Paddle-TRT FP16 mode I1203 16:26:46.105363 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode. W1203 16:26:53.203140 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy. W1203 16:26:53.203140 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. W1203 16:26:53.204133 18876 helper.h:127] Check verbose logs for the list of affected weights. W1203 16:26:53.204133 18876 helper.h:127] - 2 weights are affected by this issue: Detected subnormal FP16 values. I1203 16:26:53.206130 18876 tensorrt_subgraph_pass.cc:302] --- detect a sub-graph with 106 nodes I1203 16:26:53.210116 18876 tensorrt_subgraph_pass.cc:846] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W1203 16:26:53.212109 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:53.212109 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:53.215099 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:53.216091 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:53.226063 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. W1203 16:26:53.226063 18876 helper.h:127] Tensor DataType is determined at build time for tensors not marked as input or output. I1203 16:26:53.231046 18876 engine.cc:215] Run Paddle-TRT FP16 mode I1203 16:26:53.231046 18876 engine.cc:301] Run Paddle-TRT Dynamic Shape mode. W1203 16:27:51.221599 18876 helper.h:127] TensorRT encountered issues when converting weights between types and that could affect accuracy. W1203 16:27:51.221599 18876 helper.h:127] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. W1203 16:27:51.222596 18876 helper.h:127] Check verbose logs for the list of affected weights. W1203 16:27:51.222596 18876 helper.h:127] - 35 weights are affected by this issue: Detected subnormal FP16 values. W1203 16:27:51.222596 18876 helper.h:127] - 6 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value. e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m e[32m--- Running IR pass [conv_elementwise_add_act_fuse_pass]e[0m e[32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass]e[0m e[32m--- Running IR pass [transpose_flatten_concat_fuse_pass]e[0m e[32m--- Running IR pass [auto_mixed_precision_pass]e[0m e[1me[35m--- Running analysis [save_optimized_model_pass]e[0m e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m I1203 16:27:51.243525 18876 ir_params_sync_among_devices_pass.cc:53] Sync params from CPU to GPU e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m e[1me[35m--- Running analysis [memory_optimize_pass]e[0m I1203 16:27:51.244522 18876 memory_optimize_pass.cc:118] The persistable params in main graph are : 30.6206MB I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : bilinear_interp_v2_0.tmp_0 size: 512 I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : batch_norm_32.tmp_1 size: 4 I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : shape_0.tmp_0_slice_2 size: 4 I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : bilinear_interp_v2_1.tmp_0 size: 512 I1203 16:27:51.245518 18876 memory_optimize_pass.cc:246] Cluster name : tmp_0 size: 512 I1203 16:27:51.246515 18876 memory_optimize_pass.cc:246] Cluster name : shape_0.tmp_0_slice_1 size: 4 e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m I1203 16:27:51.259475 18876 analysis_predictor.cc:1838] ======= optimize end ======= I1203 16:27:51.260469 18876 naive_executor.cc:200] --- skip [feed], feed -> x I1203 16:27:51.260469 18876 naive_executor.cc:200] --- skip [save_infer_model/scale_0.tmp_0], fetch -> fetch W1203 16:27:51.267448 18876 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 13.0, Runtime API Version: 11.8 W1203 16:27:51.268443 18876 gpu_resources.cc:164] device: 0, cuDNN Version: 8.6. [TRT] 批量推理完成，有效结果数：976/976

===================================================== TRT 批量推理耗时统计

总有效图片数量： 976 张总推理耗时： 1782.967 毫秒平均单图推理耗时： 1.827 毫秒随机选中图片索引： 935 (有效结果)

环境

请提供您使用的PaddlePaddle、PaddleX版本号、Python版本号 paddle2onnx 1.3.1 paddleocr 2.10.0 paddlepaddle 2.6.2 paddlepaddle-gpu 2.6.1 paddleseg 0.0.0.dev0 实际为 2.10，使用了本地部署 python3.10版本在anaconda3中训练的模型，随后在Windows环境下进行c++、tensorrt部署。
请提供您使用的操作系统信息，如Linux/Windows/MacOS Windows10
请问您使用的CUDA/cuDNN的版本号是？ Cuda 版本 11.8 Cudnn 版本 8.6.0.163 tensorrt 版本8.5.1.7

以上为提供的详细信息，请问需要另外提供什么信息吗？

Contributor guide

Research direction: Investigate how to save and load TensorRT engine cache in PaddleX C++ deployment. Check the TensorRTEngineConfig structure in the PaddleX source code to see how to set model file and enable serialization. Also explore using the .pbtxt file for dynamic shapes without rebuilding the DLL.
Tech stack: python
Domain: backend
Issue type: Bug
Difficulty: 3
Estimated time: 1-3 hours
Activity status: Active
Clarity: Needs investigation
Prerequisites: C++TensorRTPaddleX
Newbie friendliness: 30

Repository metrics

Description

描述问题

复现

===================================================== TRT 批量推理耗时统计

总有效图片数量： 976 张 总推理耗时： 1782.967 毫秒 平均单图推理耗时： 1.827 毫秒 随机选中图片索引： 935 (有效结果)

环境

Contributor guide

Get fresh easy issues in your inbox.

总有效图片数量： 976 张总推理耗时： 1782.967 毫秒平均单图推理耗时： 1.827 毫秒随机选中图片索引： 935 (有效结果)