How to free multiple gpu memory · triton-inference-server/server#7825

(1 comment) (0 reactions) (0 assignees)Python (1,304 forks)batch import

help wantedonnxquestion

Repository metrics

Stars: (6,593 stars)
PR merge metrics: (Avg merge 2d 16h) (34 merged PRs in 30d)

Description

The question is how do you free memory

https://github.com/triton-inference-server/onnxruntime_backend/issues/103

When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like

parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" }  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [ 3 ]
    }
]

Contributor guide

Research direction: Investigate how to specify multiple GPU devices in the `memory.enable memory arena shrinkage` parameter. Check the Triton Inference Server documentation for multi GPU memory configuration, and look for examples or issues related to multiple GPU IDs in that parameter.
Tech stack: python
Domain: backendinfrastructure
Issue type: Research
Difficulty: 3
Estimated time: Half day
Activity status: Fresh
Clarity: Clear
Prerequisites: GPUCUDATriton Inference Server
Newbie friendliness: 40

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.