vllm-project/vllm

[torch.compile] E2E correctness testing for fusions

Open

#39.428 aperta il 9 apr 2026

Vedi su GitHub
 (8 commenti) (0 reazioni) (0 assegnatari)Python (16.816 fork)batch import
help wantedtorch.compile

Metriche repository

Star
 (80.034 star)
Metriche merge PR
 (Merge medio 9g 2h) (921 PR mergiate in 30 g)

Descrizione

E2E tests for fusions (tests/compile/fusions_e2e) have done a great job preventing fusion regressions where model/forward code changes break a custom torch.compile fusion pass. However, we currently have no way of testing correctness for these fusion configurations.

It would be good to investigate an approach where we only run a few layers of a model and compare the outputs. This would be helpful for correctness testing in general, and we could compare the outputs to both a baseline vLLM configuration and the huggingface baseline.

This would likely require some work to fix weight loading for models like DeepSeek when --hf-overrides.num_hidden_layers is overriden.

Guida contributor