triton-inference-server/server
Ver no GitHubHow to free multiple gpu memory
Open
#7.825 aberto em 22 de nov. de 2024
help wantedonnxquestion
Métricas do repositório
- Stars
- (6.593 stars)
- Métricas de merge de PR
- (Mesclagem média 2d 16h) (34 fundiu PRs em 30d)
Description
The question is how do you free memory
https://github.com/triton-inference-server/onnxruntime_backend/issues/103
When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like
parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" } }
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 3 ]
}
]