[Transformers v5] InternVL2 · vllm-project/vllm#38425

(4 留言) (0 反應) (0 負責人)Python (80,034 star) (16,816 fork)batch import

good first issuehelp wanted

描述

This is a sub-issue forming part of the work in https://github.com/vllm-project/vllm/issues/38379, please read the description of this issue before beginning to work on this one.

Which test is failing?

Transformers v5 creates the model on the meta device first, then loads the weights, similarly to what vLLM does. The issue here is that the custom model code in the checkpoint tries to use real tensors as part of model structure construction.

Since the issue here is with the HF reference generation, this cannot be fixed in vLLM (other than skipping the tests until the model works with Transformers v5). The proper solution to this issue is to upstream this architecture, which shouldn't be too hard using Modular Transformers as the text backbone is Qwen2 so that can be reused.

$ pytest tests/models/multimodal/generation/test_common.py::test_single_image_models[intern_vl-test_case25]
...
RuntimeError: Tensor.item() cannot be called on meta tensors

How to configure my environment?

It's very important that you install both vLLM and Transformers from source so that your test results reflect the current state of both libraries.

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers

貢獻者指南

技術棧: pythonpytorch
領域: machine learningbackend
議題類型: bug
難度: 4
預計時間: over 1 week
活動狀態: active
清晰度: clear
前置要求: Basic knowledge of TransformersFamiliarity with vLLMPyTorch meta tensorsModel upstreaming
新手友善度: 15
研究方向: Investigate the failing test 'tests/models/multimodal/generation/test common.py::test single image models[intern vl test case25]' that errors with RuntimeError on meta tensors. The proper fix is to upstream the InternVL2 architecture to HuggingFace Transformers using Modular Transformers, leveraging the existing Qwen2 text backbone. Study the parent issue #38379 for overall context and similar upstreaming patterns. The goal is to make the model compatible with Transformers v5's meta device initialization.