[Bug] Cannot load qwen3-vl series with lora adapter on vllm. · unslothai/unsloth#3560

(5 评论) (0 反应) (0 负责人)Python (64,271 star) (5,658 fork)batch import

good first issue

描述

I fine-tuned the Qwen3-VL-8B-Instruct model using Unsloth. My code is 99% identical to the official guide; the only change I made was replacing the 8B model in the guide with the 2B model for fine-tuning. After fine-tuning, I confirmed that the QLoRA adapter was saved correctly.

Excited and happy, I moved the saved QLoRA adapter and the Qwen3-VL-2B-Instruct model to my vLLM server. Then I ran a command to start model serving with vLLM as shown below. (For reference, the vLLM server has no issues—it was already serving official Qwen3-VL models.)

command = [
        sys.executable, 
        "-m", "vllm.entrypoints.openai.api_server",
        "--model", "./Qwen3-VL-2B-Instruct",
        "--max_model_len", "3500",
        "--gpu_memory_utilization", "0.85",
        "--trust-remote-code",
        "--host", "0.0.0.0",
        "--port", "8888",

        # for lora adapter
        "--enable-lora",
        "--max-lora-rank", "16",  # LoRA rank
        "--max-loras", "1", 
        "--max-cpu-loras", "1",
        "--lora-modules", "adapter0=./my_lora_adapter"
]

I waited for vLLM to properly load the QLoRA adapter, but the following problem occurred. This same issue happened even when I retrained LoRA using Unsloth with 2B, 4B, and 8B models.

When I was feeling hopeless, I tried merging the model instead of saving the LoRA adapter separately by using the save_pretrained_merged() function as shown below, and then vLLM was able to load and perform inference normally:

save_pretrained_merged( f"my_16bit_model", tokenizer, save_method="merged_16bit")

However, I don't want to merge the models—I want to load only the LoRA adapter. I’ve seen many posts from others experiencing the same error. As of now, what can I do to resolve this issue?

贡献者指南

技术栈: pythonpytorch
领域: machine learningai
议题类型: bug
难度: 3
预计时间: half day
活动状态: fresh
清晰度: clear
前置要求: PythonPyTorchLoRA conceptsvLLM basics
新手友好度: 30
研究方向: Investigate the vLLM compatibility with Qwen3 VL LoRA adapters. The issue mentions that merging works but separate LoRA fails. Check vLLM's documentation and issues for loading QLoRA adapters for vision language models. Look for any configuration parameters that might be missing, such as 'lora request' settings. Also consider the possibility that the adapter format from Unsloth may need conversion for vLLM. Review the save pretrained merged function to understand what is different.