Speed up RNNT model inference using TRT · NVIDIA-NeMo/NeMo#14531

(1 comment) (0 reactions) (1 assignee)Python (3,421 forks)github user discovery

ASRcommunity-requesthelp wantedwaiting-on-customer

Repository metrics

Stars: (17,298 stars)
PR merge metrics: (平均マージ 12d) (30d で 49 merged PRs)

説明

Hi,

I previously trained an RNNT model and now want to accelerate it by converting it to TensorRT. I’ve exported the model to ONNX and have encoder.onnx and decoder.onnx.

I’m using the TensorRT 25.03 Docker image and trtexec to convert the models. The decoder works fine with --fp16, but when I use --fp16 for the encoder, some outputs return NaN and the results are incorrect.

Has anyone encountered this issue or knows how to fix it?

Are there any methods to accelerate RNNT model inference?

コントリビューターガイド

調査方針: エンコーダーモデルでFP16推論が失敗する理由を調査してください。TensorRTのログで警告を確認し、FP16の演算子サポートを検証し、INT8量子化や bestフラグを付けたFP16の使用を検討してください。また、ONNXエクスポートに問題のある演算子がないか確認してください。
技術スタック: python
領域: aiperformance
Issue 種別: バグ
難度: 3
推定時間: 1-2日
活動状況: アクティブ
明確さ: 明確
前提条件: PythonTensorRTONNX
初心者向け度: 40

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。