How to free multiple gpu memory · triton-inference-server/server#7825

(1 comment) (0 reactions) (0 assignees)Python (1,304 forks)batch import

help wantedonnxquestion

Repository metrics

Stars: (6,593 stars)
PR merge metrics: (平均マージ 2d 16h) (30d で 34 merged PRs)

説明

The question is how do you free memory

https://github.com/triton-inference-server/onnxruntime_backend/issues/103

When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like

parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" }  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [ 3 ]
    }
]

コントリビューターガイド

調査方針: この問題は、複数GPUのメモリ解放を設定する方法を尋ねています。Tritonのドキュメントとソースコード内のinstance groupおよびmemory.enable memory arena shrinkageパラメータを調査してください。リンクされたissue #103は追加のコンテキストを提供する可能性があります。複数GPUの構文を明確にするためにドキュメントを更新することを検討してください。
技術スタック: python
領域: backendinfrastructure
Issue 種別: 調査
難度: 3
推定時間: 半日
活動状況: 新着
明確さ: 明確
前提条件: GPUCUDATriton Inference Server
初心者向け度: 40

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。