Speed up RNNT model inference using TRT · NVIDIA-NeMo/NeMo#14531

(1 评论) (0 反应) (1 负责人)Python (3,421 fork)github user discovery

ASRcommunity-requesthelp wantedwaiting-on-customer

仓库指标

Star: (17,298 star)
PR 合并指标: (平均合并 12天) (30 天内合并 49 个 PR)

描述

Hi,

I previously trained an RNNT model and now want to accelerate it by converting it to TensorRT. I’ve exported the model to ONNX and have encoder.onnx and decoder.onnx.

I’m using the TensorRT 25.03 Docker image and trtexec to convert the models. The decoder works fine with --fp16, but when I use --fp16 for the encoder, some outputs return NaN and the results are incorrect.

Has anyone encountered this issue or knows how to fix it?

Are there any methods to accelerate RNNT model inference?

贡献者指南

研究方向: 调查为什么FP16推断在编码器模型上失败。检查TensorRT日志中的警告，验证FP16的算子支持，考虑使用INT8量化或结合 best标志使用FP16。同时检查ONNX导出中是否有问题算子。
技术栈: python
领域: aiperformance
议题类型: 缺陷
难度: 3
预计时间: 1-2 天
活动状态: 活跃
清晰度: 清晰
前置要求: PythonTensorRTONNX
新手友好度: 40

仓库指标

描述

贡献者指南

每天在邮箱收到新鲜 Easy issues。