vllm-project/vllm
View on GitHub[Bug]: Certain Ranks Take a Look Time to Load Weights
Open
#39030 opened on Apr 5, 2026
bughelp wanted
Description
Your current environment
B200 VM
🐛 Describe the bug
I noticed that sometimes certain ranks take a very long time to load. monkeypatched log (so it logs the load time per rank)
r_TP0_EP0 pid=3425910) INFO 04-05 11:38:59 [default_loader.py:369] Starting load weights
(Worker_TP3_EP3 pid=3425913) INFO 04-05 11:38:59 [default_loader.py:369] Starting load weights
Loading safetensors checkpoint shards: 0% Completed | 0/163 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 1% Completed | 1/163 [00:00<00:48, 3.31it/s]
Loading safetensors checkpoint shards: 1% Completed | 2/163 [00:00<00:41, 3.88it/s]
Loading safetensors checkpoint shards: 2% Completed | 4/163 [00:00<00:35, 4.51it/s]
Loading safetensors checkpoint shards: 3% Completed | 5/163 [00:01<00:29, 5.38it/s]
Loading safetensors checkpoint shards: 4% Completed | 7/163 [00:01<00:29, 5.25it/s]
Loading safetensors checkpoint shards: 5% Completed | 8/163 [00:01<00:26, 5.93it/s]
Loading safetensors checkpoint shards: 6% Completed | 9/163 [00:01<00:27, 5.69it/s]
Loading safetensors checkpoint shards: 6% Completed | 10/163 [00:01<00:28, 5.41it/s]
Loading safetensors checkpoint shards: 7% Completed | 12/163 [00:02<00:19, 7.73it/s]
Loading safetensors checkpoint shards: 8% Completed | 13/163 [00:02<00:25, 5.81it/s]
Loading safetensors checkpoint shards: 9% Completed | 14/163 [00:02<00:23, 6.43it/s]
Loading safetensors checkpoint shards: 9% Completed | 15/163 [00:02<00:24, 6.16it/s]
Loading safetensors checkpoint shards: 10% Completed | 16/163 [00:02<00:26, 5.59it/s]
Loading safetensors checkpoint shards: 11% Completed | 18/163 [00:03<00:26, 5.41it/s]
(Worker_TP1_EP1 pid=3425911) INFO 04-05 11:39:03 [default_loader.py:369] Starting load weights
Loading safetensors checkpoint shards: 12% Completed | 19/163 [00:03<00:24, 5.88it/s]
Loading safetensors checkpoint shards: 13% Completed | 21/163 [00:03<00:25, 5.55it/s]
Loading safetensors checkpoint shards: 13% Completed | 22/163 [00:03<00:23, 6.13it/s]
Loading safetensors checkpoint shards: 14% Completed | 23/163 [00:04<00:23, 6.04it/s]
Loading safetensors checkpoint shards: 15% Completed | 24/163 [00:04<00:25, 5.49it/s]
Loading safetensors checkpoint shards: 16% Completed | 26/163 [00:04<00:25, 5.42it/s]
Loading safetensors checkpoint shards: 17% Completed | 27/163 [00:04<00:23, 5.81it/s]
Loading safetensors checkpoint shards: 18% Completed | 29/163 [00:05<00:25, 5.34it/s]
Loading safetensors checkpoint shards: 18% Completed | 30/163 [00:05<00:22, 5.88it/s]
Loading safetensors checkpoint shards: 19% Completed | 31/163 [00:05<00:22, 5.86it/s]
Loading safetensors checkpoint shards: 20% Completed | 32/163 [00:05<00:24, 5.25it/s]
Loading safetensors checkpoint shards: 21% Completed | 34/163 [00:05<00:17, 7.30it/s]
Loading safetensors checkpoint shards: 21% Completed | 35/163 [00:06<00:22, 5.64it/s]
Loading safetensors checkpoint shards: 22% Completed | 36/163 [00:06<00:20, 6.27it/s]
Loading safetensors checkpoint shards: 23% Completed | 37/163 [00:06<00:20, 6.07it/s]
Loading safetensors checkpoint shards: 23% Completed | 38/163 [00:06<00:22, 5.61it/s]
Loading safetensors checkpoint shards: 25% Completed | 40/163 [00:07<00:22, 5.49it/s]
Loading safetensors checkpoint shards: 25% Completed | 41/163 [00:07<00:20, 6.03it/s]
Loading safetensors checkpoint shards: 26% Completed | 43/163 [00:07<00:21, 5.63it/s]
Loading safetensors checkpoint shards: 28% Completed | 45/163 [00:07<00:19, 6.20it/s]
Loading safetensors checkpoint shards: 28% Completed | 46/163 [00:08<00:20, 5.75it/s]
Loading safetensors checkpoint shards: 29% Completed | 48/163 [00:08<00:20, 5.64it/s]
Loading safetensors checkpoint shards: 30% Completed | 49/163 [00:08<00:18, 6.03it/s]
Loading safetensors checkpoint shards: 31% Completed | 51/163 [00:08<00:20, 5.59it/s]
Loading safetensors checkpoint shards: 33% Completed | 53/163 [00:09<00:17, 6.14it/s]
Loading safetensors checkpoint shards: 33% Completed | 54/163 [00:09<00:19, 5.62it/s]
Loading safetensors checkpoint shards: 34% Completed | 56/163 [00:09<00:14, 7.35it/s]
Loading safetensors checkpoint shards: 35% Completed | 57/163 [00:09<00:18, 5.86it/s]
Loading safetensors checkpoint shards: 36% Completed | 59/163 [00:10<00:16, 6.31it/s]
Loading safetensors checkpoint shards: 37% Completed | 60/163 [00:10<00:17, 5.78it/s]
Loading safetensors checkpoint shards: 38% Completed | 62/163 [00:10<00:18, 5.48it/s]
Loading safetensors checkpoint shards: 39% Completed | 63/163 [00:10<00:16, 5.93it/s]
Loading safetensors checkpoint shards: 40% Completed | 65/163 [00:11<00:17, 5.48it/s]
Loading safetensors checkpoint shards: 41% Completed | 67/163 [00:11<00:16, 5.99it/s]
Loading safetensors checkpoint shards: 42% Completed | 68/163 [00:11<00:17, 5.52it/s]
Loading safetensors checkpoint shards: 43% Completed | 70/163 [00:12<00:16, 5.51it/s]
Loading safetensors checkpoint shards: 44% Completed | 71/163 [00:12<00:15, 5.91it/s]
Loading safetensors checkpoint shards: 45% Completed | 73/163 [00:12<00:15, 5.63it/s]
Loading safetensors checkpoint shards: 46% Completed | 75/163 [00:12<00:14, 6.23it/s]
Loading safetensors checkpoint shards: 47% Completed | 76/163 [00:13<00:15, 5.73it/s]
Loading safetensors checkpoint shards: 48% Completed | 78/163 [00:13<00:11, 7.47it/s]
Loading safetensors checkpoint shards: 48% Completed | 79/163 [00:13<00:14, 5.71it/s]
Loading safetensors checkpoint shards: 50% Completed | 81/163 [00:13<00:13, 6.20it/s]
Loading safetensors checkpoint shards: 50% Completed | 82/163 [00:14<00:13, 5.81it/s]
Loading safetensors checkpoint shards: 52% Completed | 84/163 [00:14<00:14, 5.62it/s]
Loading safetensors checkpoint shards: 52% Completed | 85/163 [00:14<00:12, 6.09it/s]
Loading safetensors checkpoint shards: 53% Completed | 87/163 [00:14<00:13, 5.72it/s]
Loading safetensors checkpoint shards: 55% Completed | 89/163 [00:15<00:11, 6.25it/s]
Loading safetensors checkpoint shards: 55% Completed | 90/163 [00:15<00:12, 5.79it/s]
Loading safetensors checkpoint shards: 56% Completed | 92/163 [00:15<00:12, 5.67it/s]
Loading safetensors checkpoint shards: 57% Completed | 93/163 [00:15<00:11, 6.09it/s]
Loading safetensors checkpoint shards: 58% Completed | 95/163 [00:16<00:11, 5.74it/s]
Loading safetensors checkpoint shards: 60% Completed | 97/163 [00:16<00:10, 6.35it/s]
Loading safetensors checkpoint shards: 60% Completed | 98/163 [00:16<00:11, 5.83it/s]
Loading safetensors checkpoint shards: 61% Completed | 100/163 [00:16<00:08, 7.64it/s]
Loading safetensors checkpoint shards: 62% Completed | 101/163 [00:17<00:10, 6.01it/s]
Loading safetensors checkpoint shards: 63% Completed | 103/163 [00:17<00:09, 6.47it/s]
Loading safetensors checkpoint shards: 64% Completed | 104/163 [00:17<00:09, 6.01it/s]
Loading safetensors checkpoint shards: 65% Completed | 106/163 [00:18<00:09, 5.77it/s]
Loading safetensors checkpoint shards: 66% Completed | 107/163 [00:18<00:08, 6.26it/s]
Loading safetensors checkpoint shards: 67% Completed | 109/163 [00:18<00:09, 5.84it/s]
Loading safetensors checkpoint shards: 68% Completed | 111/163 [00:18<00:08, 6.37it/s]
Loading safetensors checkpoint shards: 69% Completed | 112/163 [00:19<00:08, 5.91it/s]
Loading safetensors checkpoint shards: 70% Completed | 114/163 [00:19<00:08, 5.77it/s]
Loading safetensors checkpoint shards: 71% Completed | 115/163 [00:19<00:07, 6.17it/s]
Loading safetensors checkpoint shards: 72% Completed | 117/163 [00:19<00:07, 5.76it/s]
Loading safetensors checkpoint shards: 73% Completed | 119/163 [00:20<00:06, 6.37it/s]
Loading safetensors checkpoint shards: 74% Completed | 120/163 [00:20<00:07, 5.85it/s]
Loading safetensors checkpoint shards: 75% Completed | 122/163 [00:20<00:05, 7.66it/s]
Loading safetensors checkpoint shards: 75% Completed | 123/163 [00:20<00:06, 5.99it/s]
Loading safetensors checkpoint shards: 77% Completed | 125/163 [00:21<00:05, 6.43it/s]
Loading safetensors checkpoint shards: 77% Completed | 126/163 [00:21<00:06, 6.00it/s]
Loading safetensors checkpoint shards: 79% Completed | 128/163 [00:21<00:06, 5.76it/s]
Loading safetensors checkpoint shards: 79% Completed | 129/163 [00:21<00:05, 6.26it/s]
Loading safetensors checkpoint shards: 80% Completed | 131/163 [00:22<00:05, 5.79it/s]
Loading safetensors checkpoint shards: 82% Completed | 133/163 [00:22<00:04, 6.26it/s]
Loading safetensors checkpoint shards: 82% Completed | 134/163 [00:22<00:05, 5.72it/s]
Loading safetensors checkpoint shards: 83% Completed | 136/163 [00:23<00:04, 5.59it/s]
Loading safetensors checkpoint shards: 84% Completed | 137/163 [00:23<00:04, 6.01it/s]
Loading safetensors checkpoint shards: 85% Completed | 139/163 [00:23<00:04, 5.70it/s]
Loading safetensors checkpoint shards: 87% Completed | 141/163 [00:23<00:03, 7.12it/s]
Loading safetensors checkpoint shards: 87% Completed | 142/163 [00:24<00:03, 5.79it/s]
Loading safetensors checkpoint shards: 88% Completed | 144/163 [00:24<00:03, 6.29it/s]
Loading safetensors checkpoint shards: 89% Completed | 145/163 [00:24<00:03, 5.92it/s]
Loading safetensors checkpoint shards: 90% Completed | 147/163 [00:24<00:02, 5.72it/s]
Loading safetensors checkpoint shards: 91% Completed | 148/163 [00:24<00:02, 6.22it/s]
Loading safetensors checkpoint shards: 92% Completed | 150/163 [00:25<00:02, 5.81it/s]
Loading safetensors checkpoint shards: 93% Completed | 152/163 [00:25<00:01, 6.37it/s]
Loading safetensors checkpoint shards: 94% Completed | 153/163 [00:25<00:01, 5.91it/s]
Loading safetensors checkpoint shards: 95% Completed | 155/163 [00:26<00:01, 5.67it/s]
Loading safetensors checkpoint shards: 96% Completed | 156/163 [00:26<00:01, 6.09it/s]
Loading safetensors checkpoint shards: 97% Completed | 158/163 [00:26<00:00, 5.74it/s]
(Worker_TP2_EP2 pid=3425912) INFO 04-05 11:39:27 [default_loader.py:385] Loading weights took 27.12 seconds
Loading safetensors checkpoint shards: 98% Completed | 160/163 [00:26<00:00, 6.70it/s]
Loading safetensors checkpoint shards: 100% Completed | 163/163 [00:26<00:00, 6.06it/s]
(Worker_TP0_EP0 pid=3425910)
(Worker_TP0_EP0 pid=3425910) INFO 04-05 11:39:27 [default_loader.py:385] Loading weights took 26.92 seconds
(Worker_TP0_EP0 pid=3425910) WARNING 04-05 11:39:27 [kv_cache.py:109] Checkpoint does not provide a q scaling factor. Setting it to k_scale. This only matters for FP8 Attention backends (flash-attn or flashinfer).
(Worker_TP0_EP0 pid=3425910) WARNING 04-05 11:39:27 [kv_cache.py:123] Using KV cache scaling factor 1.0 for fp8_e4m3. If this is unintended, verify that k/v_scale scaling factors are properly set in the checkpoint.
(Worker_TP0_EP0 pid=3425910) INFO 04-05 11:39:27 [nvfp4.py:404] Using MoEPrepareAndFinalizeNoDPEPMonolithic
(Worker_TP3_EP3 pid=3425913) INFO 04-05 11:39:30 [default_loader.py:385] Loading weights took 30.42 seconds
(Worker_TP0_EP0 pid=3425910) INFO 04-05 11:39:30 [gpu_model_runner.py:4820] Model loading took 91.67 GiB memory and 31.993121 seconds
(Worker_TP0_EP0 pid=3425910) INFO 04-05 11:39:30 [interface.py:484] Setting kv cache block size to 32 for FLASHINFER_MLA backend.
(Worker_TP2_EP2 pid=3425912) INFO 04-05 11:39:31 [interface.py:484] Setting kv cache block size to 32 for FLASHINFER_MLA backend.
(Worker_TP3_EP3 pid=3425913) INFO 04-05 11:39:34 [interface.py:484] Setting kv cache block size to 32 for FLASHINFER_MLA backend.
(Worker_TP1_EP1 pid=3425911) INFO 04-05 11:42:21 [default_loader.py:385] Loading weights took 197.43 seconds
I dont know why this is is happening?
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.