vllm-project/vllm

[torch.compile] E2E correctness testing for fusions

Open

#39 428 ouverte le 9 avr. 2026

Voir sur GitHub
 (8 commentaires) (0 réactions) (0 assignés)Python (16 816 forks)batch import
help wantedtorch.compile

Métriques du dépôt

Stars
 (80 034 stars)
Métriques de merge PR
 (Merge moyen 9j 2h) (921 PRs mergées en 30 j)

Description

E2E tests for fusions (tests/compile/fusions_e2e) have done a great job preventing fusion regressions where model/forward code changes break a custom torch.compile fusion pass. However, we currently have no way of testing correctness for these fusion configurations.

It would be good to investigate an approach where we only run a few layers of a model and compare the outputs. This would be helpful for correctness testing in general, and we could compare the outputs to both a baseline vLLM configuration and the huggingface baseline.

This would likely require some work to fix weight loading for models like DeepSeek when --hf-overrides.num_hidden_layers is overriden.

Guide contributeur