Speed up RNNT model inference using TRT · NVIDIA-NeMo/NeMo#14531

(1 commento) (0 reazioni) (1 assegnatario)Python (3421 fork)github user discovery

ASRcommunity-requesthelp wantedwaiting-on-customer

Metriche repository

Star: (17.298 star)
Metriche merge PR: (Merge medio 12g) (49 PR mergiate in 30 g)

Descrizione

Hi,

I previously trained an RNNT model and now want to accelerate it by converting it to TensorRT. I’ve exported the model to ONNX and have encoder.onnx and decoder.onnx.

I’m using the TensorRT 25.03 Docker image and trtexec to convert the models. The decoder works fine with --fp16, but when I use --fp16 for the encoder, some outputs return NaN and the results are incorrect.

Has anyone encountered this issue or knows how to fix it?

Are there any methods to accelerate RNNT model inference?

Guida contributor

Direzione di ricerca: Indaga perché l'inferenza FP16 fallisce per il modello encoder. Controlla i log di TensorRT per avvisi, verifica il supporto degli operatori per FP16, considera l'uso della quantizzazione INT8 o di fp16 con il flag best. Esamina anche l'esportazione ONNX per operatori problematici.
Tech stack: python
Dominio: aiperformance
Tipo issue: Bug
Difficoltà: 3
Tempo stimato: 1-2 giorni
Stato attività: Attiva
Chiarezza: Chiara
Prerequisiti: PythonTensorRTONNX
Adatta ai principianti: 40

Metriche repository

Descrizione

Guida contributor

Ricevi issue Easy fresche nella tua inbox.