sgl-project/sglang-jax
View on GitHub[Feature] Speed Up Scheduler Thread Performance
Open
#293 opened on Oct 30, 2025
good first issueperformance
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sgl-jax/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
Scheduler overlap is currently supported, but when scheduler scheduling time exceeds model forward inference time, scheduling cannot be fully masked. For example, with the QWEN3-8B model, concurrency set to 16, input size 1024, output size 1024, scheduling takes 5.5ms, while model run + sampler only takes 5ms. Therefore, we need to speed up scheduler thread performance.
Related resources
Refer to this document profile-with-jax-profiler to profile the scheduler and attempt to modify the code for acceleration
No response