[torch.compile] E2E correctness testing for fusions · vllm-project/vllm#39428

(8 commenti) (0 reazioni) (0 assegnatari)Python (16.816 fork)batch import

help wantedtorch.compile

Metriche repository

Star: (80.034 star)
Metriche merge PR: (Merge medio 9g 2h) (921 PR mergiate in 30 g)

Descrizione

E2E tests for fusions (tests/compile/fusions_e2e) have done a great job preventing fusion regressions where model/forward code changes break a custom torch.compile fusion pass. However, we currently have no way of testing correctness for these fusion configurations.

It would be good to investigate an approach where we only run a few layers of a model and compare the outputs. This would be helpful for correctness testing in general, and we could compare the outputs to both a baseline vLLM configuration and the huggingface baseline.

This would likely require some work to fix weight loading for models like DeepSeek when --hf-overrides.num_hidden_layers is overriden.

Guida contributor

Direzione di ricerca: Investigare come eseguire solo pochi strati di un modello per testare la correttezza end to end delle fusioni torch.compile, confrontare gli output con la baseline vLLM e la configurazione HuggingFace, e risolvere il caricamento dei pesi per modelli come DeepSeek quando si sovrascrive num hidden layers.
Tech stack: pythonpytorch
Dominio: backend
Tipo issue: Test
Difficoltà: 3
Tempo stimato: 1-2 giorni
Stato attività: Attiva
Chiarezza: Abbastanza chiara
Prerequisiti: PythonPyTorchtorch.compile
Adatta ai principianti: 45

Metriche repository

Descrizione

Guida contributor

Ricevi issue Easy fresche nella tua inbox.