triton-inference-server/server

How to free multiple gpu memory

Open

#7,825 opened on Nov 22, 2024

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Python (1,304 forks)batch import
help wantedonnxquestion

Repository metrics

Stars
 (6,593 stars)
PR merge metrics
 (Avg merge 2d 16h) (34 merged PRs in 30d)

Description

The question is how do you free memory

https://github.com/triton-inference-server/onnxruntime_backend/issues/103

When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like

parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" }  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [ 3 ]
    }
]

Contributor guide