[torch.compile] E2E correctness testing for fusions · vllm-project/vllm#39428

(6 留言) (0 反應) (0 負責人)Python (80,034 star) (16,816 fork)batch import

help wantedtorch.compile

描述

E2E tests for fusions (tests/compile/fusions_e2e) have done a great job preventing fusion regressions where model/forward code changes break a custom torch.compile fusion pass. However, we currently have no way of testing correctness for these fusion configurations.

It would be good to investigate an approach where we only run a few layers of a model and compare the outputs. This would be helpful for correctness testing in general, and we could compare the outputs to both a baseline vLLM configuration and the huggingface baseline.

This would likely require some work to fix weight loading for models like DeepSeek when --hf-overrides.num_hidden_layers is overriden.

貢獻者指南

技術棧: pythonpytorch
領域: backendtesting
議題類型: feature
難度: 4
預計時間: 3-5 days
活動狀態: active
清晰度: needs investigation
前置要求: PyTorch compilationvLLM fusion passesModel weight loading
新手友善度: 35
研究方向: Start by examining the existing test suite at `tests/compile/fusions e2e` to understand the structure. Consider using a subset of layers with weight loading overrides via ` hf overrides.num hidden layers`, especially for models like DeepSeek. Develop a method to compare outputs against both vLLM baseline and Hugging Face reference, possibly using a small model configuration. Address any weight loading issues that arise from overriding the number of layers.