NVIDIA-NeMo/NeMo

Speed up RNNT model inference using TRT

Open

#14,531 创建于 2025年8月20日

在 GitHub 查看
 (1 评论) (0 反应) (1 负责人)Python (3,421 fork)github user discovery
ASRcommunity-requesthelp wantedwaiting-on-customer

仓库指标

Star
 (17,298 star)
PR 合并指标
 (平均合并 12天) (30 天内合并 49 个 PR)

描述

Hi,

I previously trained an RNNT model and now want to accelerate it by converting it to TensorRT. I’ve exported the model to ONNX and have encoder.onnx and decoder.onnx.

I’m using the TensorRT 25.03 Docker image and trtexec to convert the models. The decoder works fine with --fp16, but when I use --fp16 for the encoder, some outputs return NaN and the results are incorrect.

Has anyone encountered this issue or knows how to fix it?

Are there any methods to accelerate RNNT model inference?

贡献者指南