[Bug] Cannot load qwen3-vl series with lora adapter on vllm. · unslothai/unsloth#3560

倉庫指標

Star: (64,271 star)
PR 合併指標: (平均合併 3天 15小時) (30 天內合併 525 個 PR)

描述

I fine-tuned the Qwen3-VL-8B-Instruct model using Unsloth. My code is 99% identical to the official guide; the only change I made was replacing the 8B model in the guide with the 2B model for fine-tuning. After fine-tuning, I confirmed that the QLoRA adapter was saved correctly.

Excited and happy, I moved the saved QLoRA adapter and the Qwen3-VL-2B-Instruct model to my vLLM server. Then I ran a command to start model serving with vLLM as shown below. (For reference, the vLLM server has no issues—it was already serving official Qwen3-VL models.)

command = [
        sys.executable, 
        "-m", "vllm.entrypoints.openai.api_server",
        "--model", "./Qwen3-VL-2B-Instruct",
        "--max_model_len", "3500",
        "--gpu_memory_utilization", "0.85",
        "--trust-remote-code",
        "--host", "0.0.0.0",
        "--port", "8888",

        # for lora adapter
        "--enable-lora",
        "--max-lora-rank", "16",  # LoRA rank
        "--max-loras", "1", 
        "--max-cpu-loras", "1",
        "--lora-modules", "adapter0=./my_lora_adapter"
]

I waited for vLLM to properly load the QLoRA adapter, but the following problem occurred. This same issue happened even when I retrained LoRA using Unsloth with 2B, 4B, and 8B models.

When I was feeling hopeless, I tried merging the model instead of saving the LoRA adapter separately by using the save_pretrained_merged() function as shown below, and then vLLM was able to load and perform inference normally:

save_pretrained_merged( f"my_16bit_model", tokenizer, save_method="merged_16bit")

However, I don't want to merge the models—I want to load only the LoRA adapter. I’ve seen many posts from others experiencing the same error. As of now, what can I do to resolve this issue?

貢獻者指南

研究方向: 研究vLLM對多模態（視覺語言）模型的LoRA加載邏輯。檢查適配器權重是否與模型架構（如Qwen3 VL）匹配。對比成功的合併模型路徑與失敗的僅LoRA路徑。
技術棧: pythonpytorch
領域: ai
議題類型: 錯誤
難度: 3
預計時間: 半天
活動狀態: 活躍
清晰度: 大致清晰
前置要求: PythonGit
新手友善度: 60

倉庫指標

描述

貢獻者指南

每天在信箱收到新鮮 Easy issues。