vllm-project/vllm
View on GitHub[Feature]: Composite model loading using `AutoWeightsLoader` for all models
Open
#15,697 opened on Mar 28, 2025
feature requestgood first issuekeep-open
Description
🚀 The feature, motivation and pitch
#9160 first introduced AutoWeightsLoader to recursively call load_weights on sub-modules. This lets composite models (most notably multi-modal models) use language backbones (*Model classes such as LlamaModel) without having to repeat their weight loading logic.
Currently, load_weights is only implemented in a few language backbones. It would be great to standardize this approach and apply it to all language backbones in vLLM. The steps to do this are pretty straightforward:
- Move the existing
load_weightsfunction from*ForCausalLMto*Model. - Create a new
load_weightsfunction in*ForCausalLMthat loads the weights usingAutoWeightsLoader. - Move any logic in
*Model.load_weightsthat only applies to*ForCausalLMback to*ForCausalLM.load_weights. Usually, this involveslm_head.
For reference, you can look at the implementation for models such as Llama, Gemma2/3, Qwen2 and ChatGLM.
To avoid scope creep, I suggest opening a PR for updating only a few models at a time
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.