How to free multiple gpu memory · triton-inference-server/server#7825

(1 留言) (0 反應) (0 負責人)Python (1,304 fork)batch import

help wantedonnxquestion

倉庫指標

Star: (6,593 star)
PR 合併指標: (平均合併 2天 16小時) (30 天內合併 34 個 PR)

描述

The question is how do you free memory

https://github.com/triton-inference-server/onnxruntime_backend/issues/103

When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like

parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" }  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [ 3 ]
    }
]

貢獻者指南

研究方向: 該問題詢問如何為多GPU配置記憶體釋放。研究Triton文件和原始碼中的instance group和memory.enable memory arena shrinkage參數。連結的issue #103可能提供額外上下文。考慮更新文件以闡明多GPU的語法。
技術棧: python
領域: backendinfrastructure
議題類型: 調研
難度: 3
預計時間: 半天
活動狀態: 新近可參與
清晰度: 清晰
前置要求: GPUCUDATriton Inference Server
新手友善度: 40

倉庫指標

描述

貢獻者指南

每天在信箱收到新鮮 Easy issues。