vllm-project/vllm

[Feature]: Composite model loading using `AutoWeightsLoader` for all models

Open

#15,697 opened on Mar 28, 2025

View on GitHub
 (39 comments) (0 reactions) (0 assignees)Python (80,034 stars) (16,816 forks)batch import
feature requestgood first issuekeep-open

Description

🚀 The feature, motivation and pitch

#9160 first introduced AutoWeightsLoader to recursively call load_weights on sub-modules. This lets composite models (most notably multi-modal models) use language backbones (*Model classes such as LlamaModel) without having to repeat their weight loading logic.

Currently, load_weights is only implemented in a few language backbones. It would be great to standardize this approach and apply it to all language backbones in vLLM. The steps to do this are pretty straightforward:

  1. Move the existing load_weights function from *ForCausalLM to *Model.
  2. Create a new load_weights function in *ForCausalLM that loads the weights using AutoWeightsLoader.
  3. Move any logic in *Model.load_weights that only applies to *ForCausalLM back to *ForCausalLM.load_weights. Usually, this involves lm_head.

For reference, you can look at the implementation for models such as Llama, Gemma2/3, Qwen2 and ChatGLM.

To avoid scope creep, I suggest opening a PR for updating only a few models at a time

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Contributor guide