support for 4bit quantization from transfomer library. · lm-sys/FastChat#1798

(7 comments) (2 reactions) (0 assignees)Python (4,736 forks)batch import

enhancementgood first issue

Repository metrics

Loading a vicuna13B using 4bit quantization from the transformers library is possible load_in_4bit. How difficult could be for Fastach to support it?

Research direction: Explore how to integrate the load in 4bit option from transformers into FastChat's model loading pipeline. Study the transformers quantization documentation and FastChat's current model loading code.
Tech stack: pythonpytorch
Domain: backend
Issue type: Feature
Difficulty: 2
Estimated time: 1-3 hours
Activity status: Active
Clarity: Clear
Prerequisites: PythonPyTorchTransformers
Newbie friendliness: 60