[torch.compile] E2E correctness testing for fusions · vllm-project/vllm#39428

(8 comments) (0 reactions) (0 assignees)Python (16,816 forks)batch import

help wantedtorch.compile

Repository metrics

Stars: (80,034 stars)
PR merge metrics: (Avg merge 3d 17h) (993 merged PRs in 30d)

Description

E2E tests for fusions (tests/compile/fusions_e2e) have done a great job preventing fusion regressions where model/forward code changes break a custom torch.compile fusion pass. However, we currently have no way of testing correctness for these fusion configurations.

It would be good to investigate an approach where we only run a few layers of a model and compare the outputs. This would be helpful for correctness testing in general, and we could compare the outputs to both a baseline vLLM configuration and the huggingface baseline.

This would likely require some work to fix weight loading for models like DeepSeek when --hf-overrides.num_hidden_layers is overriden.

Contributor guide

Research direction: Investigate how to run only a few layers of a model for E2E correctness testing of torch.compile fusions, compare outputs to baseline vLLM and HuggingFace configurations, and fix weight loading for models like DeepSeek when overriding num hidden layers.
Tech stack: pythonpytorch
Domain: backend
Issue type: Test
Difficulty: 3
Estimated time: 1-2 days
Activity status: Active
Clarity: Mostly clear
Prerequisites: PythonPyTorchtorch.compile
Newbie friendliness: 45

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.