vllm-project/vllm

[torch.compile] E2E correctness testing for fusions

Open

#39.428 aberto em 9 de abr. de 2026

Ver no GitHub
 (8 comments) (0 reactions) (0 assignees)Python (16.816 forks)batch import
help wantedtorch.compile

Métricas do repositório

Stars
 (80.034 stars)
Métricas de merge de PR
 (Mesclagem média 9d 2h) (921 fundiu PRs em 30d)

Description

E2E tests for fusions (tests/compile/fusions_e2e) have done a great job preventing fusion regressions where model/forward code changes break a custom torch.compile fusion pass. However, we currently have no way of testing correctness for these fusion configurations.

It would be good to investigate an approach where we only run a few layers of a model and compare the outputs. This would be helpful for correctness testing in general, and we could compare the outputs to both a baseline vLLM configuration and the huggingface baseline.

This would likely require some work to fix weight loading for models like DeepSeek when --hf-overrides.num_hidden_layers is overriden.

Guia do colaborador