[torch.compile] E2E correctness testing for fusions · vllm-project/vllm#39428

(8 comments) (0 reactions) (0 assignees)Python (16.816 forks)batch import

help wantedtorch.compile

Métricas do repositório

Stars: (80.034 stars)
Métricas de merge de PR: (Mesclagem média 9d 2h) (921 fundiu PRs em 30d)

Description

E2E tests for fusions (tests/compile/fusions_e2e) have done a great job preventing fusion regressions where model/forward code changes break a custom torch.compile fusion pass. However, we currently have no way of testing correctness for these fusion configurations.

It would be good to investigate an approach where we only run a few layers of a model and compare the outputs. This would be helpful for correctness testing in general, and we could compare the outputs to both a baseline vLLM configuration and the huggingface baseline.

This would likely require some work to fix weight loading for models like DeepSeek when --hf-overrides.num_hidden_layers is overriden.

Guia do colaborador

Direção de pesquisa: Investigue como executar apenas algumas camadas de um modelo para teste de correção de ponta a ponta das fusões torch.compile, compare as saídas com a configuração baseline do vLLM e do HuggingFace, e corrija o carregamento de pesos para modelos como DeepSeek ao substituir num hidden layers.
Pilha de tecnologia: pythonpytorch
Domain: backend
Tipo Issue: Teste
Difficulty: 3
Tempo estimado: 1-2 dias
Status da atividade: Ativo
Clarity: Principalmente claro
Prerequisites: PythonPyTorchtorch.compile
Simpatia para novatos: 45

Métricas do repositório

Description

Guia do colaborador

Receba issues Easy novas por email.