vllm-project/vllm

[Bug]: Certain Ranks Take a Look Time to Load Weights

Open

#39030 opened on Apr 5, 2026

View on GitHub
 (3 comments) (0 reactions) (0 assignees)Python (80,034 stars) (16,816 forks)batch import
bughelp wanted

Description

Your current environment

B200 VM

🐛 Describe the bug

I noticed that sometimes certain ranks take a very long time to load. monkeypatched log (so it logs the load time per rank)

r_TP0_EP0 pid=3425910) INFO 04-05 11:38:59 [default_loader.py:369] Starting load weights
(Worker_TP3_EP3 pid=3425913) INFO 04-05 11:38:59 [default_loader.py:369] Starting load weights
Loading safetensors checkpoint shards:   0% Completed | 0/163 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:   1% Completed | 1/163 [00:00<00:48,  3.31it/s]
Loading safetensors checkpoint shards:   1% Completed | 2/163 [00:00<00:41,  3.88it/s]
Loading safetensors checkpoint shards:   2% Completed | 4/163 [00:00<00:35,  4.51it/s]
Loading safetensors checkpoint shards:   3% Completed | 5/163 [00:01<00:29,  5.38it/s]
Loading safetensors checkpoint shards:   4% Completed | 7/163 [00:01<00:29,  5.25it/s]
Loading safetensors checkpoint shards:   5% Completed | 8/163 [00:01<00:26,  5.93it/s]
Loading safetensors checkpoint shards:   6% Completed | 9/163 [00:01<00:27,  5.69it/s]
Loading safetensors checkpoint shards:   6% Completed | 10/163 [00:01<00:28,  5.41it/s]
Loading safetensors checkpoint shards:   7% Completed | 12/163 [00:02<00:19,  7.73it/s]
Loading safetensors checkpoint shards:   8% Completed | 13/163 [00:02<00:25,  5.81it/s]
Loading safetensors checkpoint shards:   9% Completed | 14/163 [00:02<00:23,  6.43it/s]
Loading safetensors checkpoint shards:   9% Completed | 15/163 [00:02<00:24,  6.16it/s]
Loading safetensors checkpoint shards:  10% Completed | 16/163 [00:02<00:26,  5.59it/s]
Loading safetensors checkpoint shards:  11% Completed | 18/163 [00:03<00:26,  5.41it/s]
(Worker_TP1_EP1 pid=3425911) INFO 04-05 11:39:03 [default_loader.py:369] Starting load weights
Loading safetensors checkpoint shards:  12% Completed | 19/163 [00:03<00:24,  5.88it/s]
Loading safetensors checkpoint shards:  13% Completed | 21/163 [00:03<00:25,  5.55it/s]
Loading safetensors checkpoint shards:  13% Completed | 22/163 [00:03<00:23,  6.13it/s]
Loading safetensors checkpoint shards:  14% Completed | 23/163 [00:04<00:23,  6.04it/s]
Loading safetensors checkpoint shards:  15% Completed | 24/163 [00:04<00:25,  5.49it/s]
Loading safetensors checkpoint shards:  16% Completed | 26/163 [00:04<00:25,  5.42it/s]
Loading safetensors checkpoint shards:  17% Completed | 27/163 [00:04<00:23,  5.81it/s]
Loading safetensors checkpoint shards:  18% Completed | 29/163 [00:05<00:25,  5.34it/s]
Loading safetensors checkpoint shards:  18% Completed | 30/163 [00:05<00:22,  5.88it/s]
Loading safetensors checkpoint shards:  19% Completed | 31/163 [00:05<00:22,  5.86it/s]
Loading safetensors checkpoint shards:  20% Completed | 32/163 [00:05<00:24,  5.25it/s]
Loading safetensors checkpoint shards:  21% Completed | 34/163 [00:05<00:17,  7.30it/s]
Loading safetensors checkpoint shards:  21% Completed | 35/163 [00:06<00:22,  5.64it/s]
Loading safetensors checkpoint shards:  22% Completed | 36/163 [00:06<00:20,  6.27it/s]
Loading safetensors checkpoint shards:  23% Completed | 37/163 [00:06<00:20,  6.07it/s]
Loading safetensors checkpoint shards:  23% Completed | 38/163 [00:06<00:22,  5.61it/s]
Loading safetensors checkpoint shards:  25% Completed | 40/163 [00:07<00:22,  5.49it/s]
Loading safetensors checkpoint shards:  25% Completed | 41/163 [00:07<00:20,  6.03it/s]
Loading safetensors checkpoint shards:  26% Completed | 43/163 [00:07<00:21,  5.63it/s]
Loading safetensors checkpoint shards:  28% Completed | 45/163 [00:07<00:19,  6.20it/s]
Loading safetensors checkpoint shards:  28% Completed | 46/163 [00:08<00:20,  5.75it/s]
Loading safetensors checkpoint shards:  29% Completed | 48/163 [00:08<00:20,  5.64it/s]
Loading safetensors checkpoint shards:  30% Completed | 49/163 [00:08<00:18,  6.03it/s]
Loading safetensors checkpoint shards:  31% Completed | 51/163 [00:08<00:20,  5.59it/s]
Loading safetensors checkpoint shards:  33% Completed | 53/163 [00:09<00:17,  6.14it/s]
Loading safetensors checkpoint shards:  33% Completed | 54/163 [00:09<00:19,  5.62it/s]
Loading safetensors checkpoint shards:  34% Completed | 56/163 [00:09<00:14,  7.35it/s]
Loading safetensors checkpoint shards:  35% Completed | 57/163 [00:09<00:18,  5.86it/s]
Loading safetensors checkpoint shards:  36% Completed | 59/163 [00:10<00:16,  6.31it/s]
Loading safetensors checkpoint shards:  37% Completed | 60/163 [00:10<00:17,  5.78it/s]
Loading safetensors checkpoint shards:  38% Completed | 62/163 [00:10<00:18,  5.48it/s]
Loading safetensors checkpoint shards:  39% Completed | 63/163 [00:10<00:16,  5.93it/s]
Loading safetensors checkpoint shards:  40% Completed | 65/163 [00:11<00:17,  5.48it/s]
Loading safetensors checkpoint shards:  41% Completed | 67/163 [00:11<00:16,  5.99it/s]
Loading safetensors checkpoint shards:  42% Completed | 68/163 [00:11<00:17,  5.52it/s]
Loading safetensors checkpoint shards:  43% Completed | 70/163 [00:12<00:16,  5.51it/s]
Loading safetensors checkpoint shards:  44% Completed | 71/163 [00:12<00:15,  5.91it/s]
Loading safetensors checkpoint shards:  45% Completed | 73/163 [00:12<00:15,  5.63it/s]
Loading safetensors checkpoint shards:  46% Completed | 75/163 [00:12<00:14,  6.23it/s]
Loading safetensors checkpoint shards:  47% Completed | 76/163 [00:13<00:15,  5.73it/s]
Loading safetensors checkpoint shards:  48% Completed | 78/163 [00:13<00:11,  7.47it/s]
Loading safetensors checkpoint shards:  48% Completed | 79/163 [00:13<00:14,  5.71it/s]
Loading safetensors checkpoint shards:  50% Completed | 81/163 [00:13<00:13,  6.20it/s]
Loading safetensors checkpoint shards:  50% Completed | 82/163 [00:14<00:13,  5.81it/s]
Loading safetensors checkpoint shards:  52% Completed | 84/163 [00:14<00:14,  5.62it/s]
Loading safetensors checkpoint shards:  52% Completed | 85/163 [00:14<00:12,  6.09it/s]
Loading safetensors checkpoint shards:  53% Completed | 87/163 [00:14<00:13,  5.72it/s]
Loading safetensors checkpoint shards:  55% Completed | 89/163 [00:15<00:11,  6.25it/s]
Loading safetensors checkpoint shards:  55% Completed | 90/163 [00:15<00:12,  5.79it/s]
Loading safetensors checkpoint shards:  56% Completed | 92/163 [00:15<00:12,  5.67it/s]
Loading safetensors checkpoint shards:  57% Completed | 93/163 [00:15<00:11,  6.09it/s]
Loading safetensors checkpoint shards:  58% Completed | 95/163 [00:16<00:11,  5.74it/s]
Loading safetensors checkpoint shards:  60% Completed | 97/163 [00:16<00:10,  6.35it/s]
Loading safetensors checkpoint shards:  60% Completed | 98/163 [00:16<00:11,  5.83it/s]
Loading safetensors checkpoint shards:  61% Completed | 100/163 [00:16<00:08,  7.64it/s]
Loading safetensors checkpoint shards:  62% Completed | 101/163 [00:17<00:10,  6.01it/s]
Loading safetensors checkpoint shards:  63% Completed | 103/163 [00:17<00:09,  6.47it/s]
Loading safetensors checkpoint shards:  64% Completed | 104/163 [00:17<00:09,  6.01it/s]
Loading safetensors checkpoint shards:  65% Completed | 106/163 [00:18<00:09,  5.77it/s]
Loading safetensors checkpoint shards:  66% Completed | 107/163 [00:18<00:08,  6.26it/s]
Loading safetensors checkpoint shards:  67% Completed | 109/163 [00:18<00:09,  5.84it/s]
Loading safetensors checkpoint shards:  68% Completed | 111/163 [00:18<00:08,  6.37it/s]
Loading safetensors checkpoint shards:  69% Completed | 112/163 [00:19<00:08,  5.91it/s]
Loading safetensors checkpoint shards:  70% Completed | 114/163 [00:19<00:08,  5.77it/s]
Loading safetensors checkpoint shards:  71% Completed | 115/163 [00:19<00:07,  6.17it/s]
Loading safetensors checkpoint shards:  72% Completed | 117/163 [00:19<00:07,  5.76it/s]
Loading safetensors checkpoint shards:  73% Completed | 119/163 [00:20<00:06,  6.37it/s]
Loading safetensors checkpoint shards:  74% Completed | 120/163 [00:20<00:07,  5.85it/s]
Loading safetensors checkpoint shards:  75% Completed | 122/163 [00:20<00:05,  7.66it/s]
Loading safetensors checkpoint shards:  75% Completed | 123/163 [00:20<00:06,  5.99it/s]
Loading safetensors checkpoint shards:  77% Completed | 125/163 [00:21<00:05,  6.43it/s]
Loading safetensors checkpoint shards:  77% Completed | 126/163 [00:21<00:06,  6.00it/s]
Loading safetensors checkpoint shards:  79% Completed | 128/163 [00:21<00:06,  5.76it/s]
Loading safetensors checkpoint shards:  79% Completed | 129/163 [00:21<00:05,  6.26it/s]
Loading safetensors checkpoint shards:  80% Completed | 131/163 [00:22<00:05,  5.79it/s]
Loading safetensors checkpoint shards:  82% Completed | 133/163 [00:22<00:04,  6.26it/s]
Loading safetensors checkpoint shards:  82% Completed | 134/163 [00:22<00:05,  5.72it/s]
Loading safetensors checkpoint shards:  83% Completed | 136/163 [00:23<00:04,  5.59it/s]
Loading safetensors checkpoint shards:  84% Completed | 137/163 [00:23<00:04,  6.01it/s]
Loading safetensors checkpoint shards:  85% Completed | 139/163 [00:23<00:04,  5.70it/s]
Loading safetensors checkpoint shards:  87% Completed | 141/163 [00:23<00:03,  7.12it/s]
Loading safetensors checkpoint shards:  87% Completed | 142/163 [00:24<00:03,  5.79it/s]
Loading safetensors checkpoint shards:  88% Completed | 144/163 [00:24<00:03,  6.29it/s]
Loading safetensors checkpoint shards:  89% Completed | 145/163 [00:24<00:03,  5.92it/s]
Loading safetensors checkpoint shards:  90% Completed | 147/163 [00:24<00:02,  5.72it/s]
Loading safetensors checkpoint shards:  91% Completed | 148/163 [00:24<00:02,  6.22it/s]
Loading safetensors checkpoint shards:  92% Completed | 150/163 [00:25<00:02,  5.81it/s]
Loading safetensors checkpoint shards:  93% Completed | 152/163 [00:25<00:01,  6.37it/s]
Loading safetensors checkpoint shards:  94% Completed | 153/163 [00:25<00:01,  5.91it/s]
Loading safetensors checkpoint shards:  95% Completed | 155/163 [00:26<00:01,  5.67it/s]
Loading safetensors checkpoint shards:  96% Completed | 156/163 [00:26<00:01,  6.09it/s]
Loading safetensors checkpoint shards:  97% Completed | 158/163 [00:26<00:00,  5.74it/s]
(Worker_TP2_EP2 pid=3425912) INFO 04-05 11:39:27 [default_loader.py:385] Loading weights took 27.12 seconds
Loading safetensors checkpoint shards:  98% Completed | 160/163 [00:26<00:00,  6.70it/s]
Loading safetensors checkpoint shards: 100% Completed | 163/163 [00:26<00:00,  6.06it/s]
(Worker_TP0_EP0 pid=3425910) 
(Worker_TP0_EP0 pid=3425910) INFO 04-05 11:39:27 [default_loader.py:385] Loading weights took 26.92 seconds
(Worker_TP0_EP0 pid=3425910) WARNING 04-05 11:39:27 [kv_cache.py:109] Checkpoint does not provide a q scaling factor. Setting it to k_scale. This only matters for FP8 Attention backends (flash-attn or flashinfer).
(Worker_TP0_EP0 pid=3425910) WARNING 04-05 11:39:27 [kv_cache.py:123] Using KV cache scaling factor 1.0 for fp8_e4m3. If this is unintended, verify that k/v_scale scaling factors are properly set in the checkpoint.
(Worker_TP0_EP0 pid=3425910) INFO 04-05 11:39:27 [nvfp4.py:404] Using MoEPrepareAndFinalizeNoDPEPMonolithic
(Worker_TP3_EP3 pid=3425913) INFO 04-05 11:39:30 [default_loader.py:385] Loading weights took 30.42 seconds
(Worker_TP0_EP0 pid=3425910) INFO 04-05 11:39:30 [gpu_model_runner.py:4820] Model loading took 91.67 GiB memory and 31.993121 seconds
(Worker_TP0_EP0 pid=3425910) INFO 04-05 11:39:30 [interface.py:484] Setting kv cache block size to 32 for FLASHINFER_MLA backend.
(Worker_TP2_EP2 pid=3425912) INFO 04-05 11:39:31 [interface.py:484] Setting kv cache block size to 32 for FLASHINFER_MLA backend.
(Worker_TP3_EP3 pid=3425913) INFO 04-05 11:39:34 [interface.py:484] Setting kv cache block size to 32 for FLASHINFER_MLA backend.
(Worker_TP1_EP1 pid=3425911) INFO 04-05 11:42:21 [default_loader.py:385] Loading weights took 197.43 seconds

I dont know why this is is happening?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Contributor guide