triton-inference-server/server

How to free multiple gpu memory

Open

#7.825 aberto em 22 de nov. de 2024

Ver no GitHub
 (1 comment) (0 reactions) (0 assignees)Python (1.304 forks)batch import
help wantedonnxquestion

Métricas do repositório

Stars
 (6.593 stars)
Métricas de merge de PR
 (Mesclagem média 2d 16h) (34 fundiu PRs em 30d)

Description

The question is how do you free memory

https://github.com/triton-inference-server/onnxruntime_backend/issues/103

When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like

parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" }  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [ 3 ]
    }
]

Guia do colaborador