triton-inference-server/server

How to free multiple gpu memory

Open

#7.825 geöffnet am 22. Nov. 2024

Auf GitHub ansehen
 (1 Kommentar) (0 Reaktionen) (0 zugewiesene Personen)Python (1.304 Forks)batch import
help wantedonnxquestion

Repository-Metriken

Stars
 (6.593 Stars)
PR-Merge-Metriken
 (Durchschn. Merge 2T 16h) (34 gemergte PRs in 30 T)

Beschreibung

The question is how do you free memory

https://github.com/triton-inference-server/onnxruntime_backend/issues/103

When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like

parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" }  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [ 3 ]
    }
]

Contributor Guide