triton-inference-server/server

How to free multiple gpu memory

Open

#7825 aperta il 22 nov 2024

Vedi su GitHub
 (1 commento) (0 reazioni) (0 assegnatari)Python (1304 fork)batch import
help wantedonnxquestion

Metriche repository

Star
 (6593 star)
Metriche merge PR
 (Merge medio 2g 16h) (34 PR mergiate in 30 g)

Descrizione

The question is how do you free memory

https://github.com/triton-inference-server/onnxruntime_backend/issues/103

When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like

parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" }  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [ 3 ]
    }
]

Guida contributor