support for 4bit quantization from transfomer library. · lm-sys/FastChat#1798 | Good First Issue

(7 评论) (2 反应) (0 负责人)Python (4,736 fork)batch import

enhancementgood first issue

仓库指标

Star: (38,959 star)
PR 合并指标: (30 天内没有已合并 PR)

描述

Loading a vicuna13B using 4bit quantization from the transformers library is possible load_in_4bit. How difficult could be for Fastach to support it?

贡献者指南

研究方向: 研究 FastChat 当前如何加载模型（例如在 model worker.py 中），并复现 transformers 库中的 load in 4bit 参数。查阅 issue 中链接的 transformers 量化文档以了解 API。检查现有评论以获取任何见解或阻碍。在模型加载管道中实现该参数，确保与现有模型服务基础设施的兼容性。
技术栈: pythonpytorch
领域: backend
议题类型: 功能
难度: 2
预计时间: 1-3 小时
活动状态: 活跃
清晰度: 清晰
前置要求: PythonPyTorchTransformers
新手友好度: 60