vllm-project/vllm

Upgrade to Transformers v5

Open

#38379 opened on Mar 27, 2026

View on GitHub
 (1 comment) (10 reactions) (1 assignee)Python (80,034 stars) (16,816 forks)batch import
help wanted

Description

What is this issue?

This issue serves as a living tracker for the current issues preventing us from upgrading vLLM to Transformers v5.

We will use sub-issues to track individual failures and PRs should be made against these sub-issues.

The solutions to these issues may need to be applied to either:

  • Transformers in the form of:
    • Adding missing backward compatibility (usually for custom code models)
    • General bug fixes/improvements to new features of v5
  • vLLM in the form of:
    • Forward compatibility with how something is now done in v5
    • Edge case handling for issues that v4 ignored (such as config validation)

Sometimes, the issue is simply with the model checkpoint itself, for example if it:

  • Contains a malformed config.json that cannot be used to instantiate the newly input validated PreTrainedConfig class
  • Custom code* uses deprecated/removed APIs

In these situations, the best solution will likely be to skip these tests in vLLM and open a PR to Transformers to contribute this model. This will be faster and more sustainable than waiting for the model vendor to fix their custom model code, sometimes they nevert do.

Contributing the new model should be done using the new Modular Transformers so that the implementation is easy to maintain and will remain maintained by the Transformers team.

*particularly in the parts of the model implementation that vLLM tries to directly reuse, such as config/tokenizer/multimodal processor

Comprehensive list of skips

Now that the parent PR is merged we have a comprehensive list of all tests that are currently skipped on main

Module-level skips (skip everything in the file)

  • PR: TBDtests/lora/test_minicpmv_tp.py (pytestmark = pytest.mark.skipif(transformers >= 5.0)) — MiniCPMV custom processor uses tokenizer.im_start_id not available on TokenizersBackend in transformers v5+
  • PR: TBDtests/models/multimodal/generation/test_phi4siglip.py (pytestmark = pytest.mark.skipif(transformers >= 5.0)) — HF model custom code uses siglip2 internals (filter_out_non_signature_kwargs) removed by HF#43514
  • PR: TBDtests/models/multimodal/pooling/test_colqwen3.py (pytestmark = pytest.mark.skip(...)) — ColQwen3 weight tying incompatible with transformers v5 (missing all_tied_weights_keys)
  • PR: TBDtests/models/multimodal/pooling/test_intern_vit.py (pytestmark = pytest.mark.skip(...)) — InternVisionModel custom code incompatible with transformers v5 (missing all_tied_weights_keys)
  • PR: TBDtests/models/multimodal/pooling/test_jinavl_reranker.py (pytestmark = pytest.mark.skip(...)) — jinaai/jina-reranker-m0 custom code incompatible with transformers v5 (missing all_tied_weights_keys)

Function-level / parametrized skips

  • PR: TBDtests/models/language/pooling_mteb_test/test_jina.py::test_embed_models_correctness (entire @parametrize block at line 759, covers all EMBEDDING_MODELS x dtype=half x dimensions=[16, 32]) — jinaai/jina-embeddings-v3 custom XLMRobertaLoRA model incompatible with transformers v5 (missing all_tied_weights_keys)
  • PR: https://github.com/vllm-project/vllm/pull/42498tests/models/multimodal/generation/test_nemotron_parse.pynvidia/NVIDIA-Nemotron-Parse-v1.1 parametrized test (entire run_test block at line 875) — Custom MBart decoder head-count mismatch with transformers v5 GQA-aware cross-attention (8 vs 16 heads)
  • PR: TBDtests/models/multimodal/generation/test_voxtral.py::test_hf_referenceVoxtralProcessor.apply_chat_template() in transformers v5 doesn't resolve chat_template=None to default
  • PR: TBDtests/models/multimodal/processing/test_musicflamingo.py::test_musicflamingo_audio_feature_pipeline_matches_hf_small_config (skipif transformers >= 5.5) — transformers v5.5 added native MusicFlamingoForConditionalGeneration with different get_audio_features signature
  • PR: TBDtests/v1/e2e/spec_decode/test_spec_decode.py("eagle3", "Qwen/Qwen3-8B", "AngelSlim/Qwen3-8B_eagle3", 1) param of test_eagle_correctness_* — "Feature is experimental and uses too much memory in CI" (TODO from hmellor)

tests/models/multimodal/generation/test_common.py — VLMTestInfo entries newly marked pytest.mark.skip

  • PR: TBDultravox (fixie-ai/ultravox-v0_5-llama-3_2-1b) — Custom model code is not compatible with Transformers v5
  • PR: TBDintern_vl image (OpenGVLab/InternVL2-1B, OpenGVLab/InternVL2-2B, OpenGVLab/Mono-InternVL-2B) — Custom model code tries to access data from meta-tensor
  • PR: TBDintern_vl-video (InternVL video models) — Custom model code tries to access data from meta-tensor
  • PR: TBDisaac (PerceptronAI/Isaac-0.1-2B) — Custom model imports deleted object
  • PR: TBDintern_vl custom-input case at line 854 (InternVL custom-input variant) — Custom model code tries to access data from meta-tensor
  • PR: TBD — Additional skipif at line 846 for transformers >= 5.0.0 — Model's custom code uses ROPE_INIT_FUNCTIONS['default'] which was removed in transformers v5

tests/models/language/pooling_mteb_test/enable_test=False

  • PR: TBDtest_baai.py BAAI entry at line 729 — Custom tokenizer on HF hub incompatible with transformers v5 (sets attrs before super().__init__, causing AttributeError on verbose)
  • PR: TBDtest_gte.py GTE entry at line 745 — Numerical regression with transformers v5

tests/models/registry.py — entries gated by max_transformers_version

  • PR: TBDInternLM2VEForCausalLM (OpenGVLab/Mono-InternVL-2B), cap 4.57 — Custom config can't be loaded with v5, vision_config not always set
  • PR: TBDPlamo2ForCausalLM (pfnet/plamo-2-1b), cap 4.57 — Custom code uses _tied_weight_keys: list[str]; v5 expects dict[str, str]
  • PR: TBDStep3VLForConditionalGeneration (line ~530), cap 5.3validate_rope() no longer accepts ignore_keys param above v5.4
  • PR: TBDXverseForCausalLM (xverse/XVERSE-7B-Chat), cap 4.57 — XVERSE tokenizer incompatible with v5 (add_prefix_space/prepend_scheme mismatch)
  • PR: TBDFireRedASR2ForConditionalGeneration (allendou/FireRedASR2-LLM-vllm), cap 5.1 — Incompatible with v5.2+ (dict object has no attribute '__name__')
  • PR: TBDFireRedLIDForConditionalGeneration (PatchyTisa/FireRedLID-vllm), cap 5.1 — Same as FireRedASR2 (dict object has no attribute '__name__')
  • PR: TBDFunASRForConditionalGeneration (allendou/Fun-ASR-Nano-2512-vllm), cap 5.1 — Same as FireRedASR2 (dict object has no attribute '__name__')
  • PR: https://github.com/vllm-project/vllm/pull/38447HCXVisionForCausalLM (naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B), cap 4.57 — Custom config can't be loaded with v5, text_config not always set
  • PR: TBDInternS1ForConditionalGeneration (internlm/Intern-S1), cap 4.57 — Custom tokenizer code not compatible with v5
  • PR: TBDMiniCPMO (openbmb/MiniCPM-o-2_6), cap 4.57 — Custom processor code not compatible with v5
  • PR: TBDMiniCPMV (openbmb/MiniCPM-Llama3-V-2_5 and 2.6/4.0/4.5 variants), cap 4.57MiniCPMVBatchFeature incompatible with its v5 base class
  • PR: TBDOpenCUAForConditionalGeneration (xlangai/OpenCUA-7B), cap 4.57 — Tokenizer can't be initialised in v5
  • PR: TBDOpenPanguVLForConditionalGeneration (FreedomIntelligence/openPangu-VL-7B), cap 4.57OpenPanguVLVideoProcessorInitKwargs doesn't specify total=False
  • PR: TBDOvis2_5 (AIDC-AI/Ovis2.5-2B), cap 4.57 — Custom processor code not compatible with v5
  • PR: TBDOvis2_6_MoeForCausalLM (AIDC-AI/Ovis2.6-30B-A3B), cap 4.57 — Custom processor code not compatible with v5
  • PR: TBDPhi4ForCausalLMV (microsoft/Phi-4-reasoning-vision-15B), cap 5.3 — siglip2 internals removed by HF#43514 above v5.4
  • PR: TBDTarsier2ForConditionalGeneration (line ~1267), cap 5.3Qwen2VLConfig split into Qwen2VLConfig + Qwen2VLTextConfig in v5

Sub-issue template

This is a sub-issue forming part of the work in https://github.com/vllm-project/vllm/issues/38379, please read the description of this issue before beginning to work on this one.

## Which test is failing?

```console
$ pytest tests/
...

```

## How to configure my environment?

It's very important that you install both vLLM and Transformers from source so that your test results reflect the current state of both libraries.

```console
# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers
```

Contributor guide