[Feature]: Batch Invariant Feature and Performance Optimization
#27433 opened on Oct 23, 2025
Description
🚀 The feature, motivation and pitch
We have basically support Batch Invariant based on https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
https://github.com/orgs/vllm-project/projects/29/views/1
But there are still some work to be done, so here is the issue to track the work
TODOs:
-
Basic framework https://github.com/vllm-project/vllm/pull/25603 @bwasti
-
Flashinfer support https://github.com/vllm-project/vllm/pull/26373 @bwasti
-
Deepseek-v3 https://github.com/vllm-project/vllm/pull/26609 @bwasti
-
DeepGEMM on Blackwell https://github.com/vllm-project/vllm/pull/27127 @yewentao256
-
Batch Invariant for R1 TP 8 on Blackwell https://github.com/vllm-project/vllm/pull/27229 @yewentao256
-
Torch compile & Cuda Graph support https://github.com/vllm-project/vllm/pull/27660 @PaulZhang12
-
Usability & Documentation @bwasti https://github.com/vllm-project/vllm/pull/27839
-
an RL example @bwasti https://github.com/bwasti/spirl
-
Adds Batch invariant tests to CI https://github.com/vllm-project/vllm/pull/27842 @yewentao256
-
TRITON_MLA support https://github.com/vllm-project/vllm/pull/29125 @yewentao256
-
FLASHINFER_MLA support 🙋Help needed, context: https://github.com/flashinfer-ai/flashinfer/issues/2107
-
Optimize the batch invariant performance
- BMM optimization https://github.com/vllm-project/vllm/pull/29345 @yewentao256
- https://github.com/vllm-project/vllm/pull/40413 @yewentao256
- https://github.com/vllm-project/vllm/pull/40408 @yewentao256
- 🙋Help needed
Nice to have:
- Prefix caching support
- NVFP4 support
- AMD testing/support
- Speculative decoding support (this might be hard)
- vLLM Support for Generic Model Definitions @bwasti https://github.com/vllm-project/vllm/issues/28326
- (Out of scope) DP Support https://github.com/vllm-project/vllm/issues/30321
Model coverage
https://docs.vllm.ai/en/latest/features/batch_invariance/#tested-models
🙋Help needed for validations of more models.
- Test a model using the script in https://github.com/vllm-project/vllm/tree/main/tests/v1/determinism
- Submit a PR updating the document