vllm-project/vllm

[torch.compile] E2E correctness testing for fusions

Open

#39.428 geöffnet am 9. Apr. 2026

Auf GitHub ansehen
 (8 Kommentare) (0 Reaktionen) (0 zugewiesene Personen)Python (16.816 Forks)batch import
help wantedtorch.compile

Repository-Metriken

Stars
 (80.034 Stars)
PR-Merge-Metriken
 (Durchschn. Merge 9T 2h) (921 gemergte PRs in 30 T)

Beschreibung

E2E tests for fusions (tests/compile/fusions_e2e) have done a great job preventing fusion regressions where model/forward code changes break a custom torch.compile fusion pass. However, we currently have no way of testing correctness for these fusion configurations.

It would be good to investigate an approach where we only run a few layers of a model and compare the outputs. This would be helpful for correctness testing in general, and we could compare the outputs to both a baseline vLLM configuration and the huggingface baseline.

This would likely require some work to fix weight loading for models like DeepSeek when --hf-overrides.num_hidden_layers is overriden.

Contributor Guide