triton-inference-server/server

How to free multiple gpu memory

Open

#7 825 ouverte le 22 nov. 2024

Voir sur GitHub
 (1 commentaire) (0 réactions) (0 assignés)Python (1 304 forks)batch import
help wantedonnxquestion

Métriques du dépôt

Stars
 (6 593 stars)
Métriques de merge PR
 (Merge moyen 2j 16h) (34 PRs mergées en 30 j)

Description

The question is how do you free memory

https://github.com/triton-inference-server/onnxruntime_backend/issues/103

When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like

parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" }  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [ 3 ]
    }
]

Guide contributeur