How to free multiple gpu memory · triton-inference-server/server#7825

(1 评论) (0 反应) (0 负责人)Python (1,304 fork)batch import

help wantedonnxquestion

仓库指标

Star: (6,593 star)
PR 合并指标: (平均合并 2天 16小时) (30 天内合并 34 个 PR)

描述

The question is how do you free memory

https://github.com/triton-inference-server/onnxruntime_backend/issues/103

When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like

parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" }  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [ 3 ]
    }
]

贡献者指南

研究方向: 该问题询问如何为多GPU配置内存释放。研究Triton文档和源代码中的instance group和memory.enable memory arena shrinkage参数。链接的issue #103可能提供额外上下文。考虑更新文档以阐明多GPU的语法。
技术栈: python
领域: backendinfrastructure
议题类型: 调研
难度: 3
预计时间: 半天
活动状态: 新近可参与
清晰度: 清晰
前置要求: GPUCUDATriton Inference Server
新手友好度: 40

仓库指标

描述

贡献者指南

每天在邮箱收到新鲜 Easy issues。