[torch.compile] E2E correctness testing for fusions · vllm-project/vllm#39428

(8 comments) (0 reactions) (0 assignees)Python (16,816 forks)batch import

help wantedtorch.compile

Repository metrics

Stars: (80,034 stars)
PR merge metrics: (平均マージ 3d 17h) (30d で 993 merged PRs)

説明

E2E tests for fusions (tests/compile/fusions_e2e) have done a great job preventing fusion regressions where model/forward code changes break a custom torch.compile fusion pass. However, we currently have no way of testing correctness for these fusion configurations.

It would be good to investigate an approach where we only run a few layers of a model and compare the outputs. This would be helpful for correctness testing in general, and we could compare the outputs to both a baseline vLLM configuration and the huggingface baseline.

This would likely require some work to fix weight loading for models like DeepSeek when --hf-overrides.num_hidden_layers is overriden.

コントリビューターガイド

調査方針: モデルの数層のみを実行してtorch.compile融合のエンドツーエンド正しさテストを行う方法を調査し、出力をベースラインのvLLMとHuggingFaceの設定と比較し、num hidden layersをオーバーライドした際のDeepSeekのようなモデルの重み読み込みを修正してください。
技術スタック: pythonpytorch
領域: backend
Issue 種別: テスト
難度: 3
推定時間: 1-2日
活動状況: アクティブ
明確さ: おおむね明確
前提条件: PythonPyTorchtorch.compile
初心者向け度: 45

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。