sgl-project/sglang-jax

[Feature] Speed Up Scheduler Thread Performance

Open

#293 opened on Oct 30, 2025

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Python (276 stars) (101 forks)auto 404
good first issueperformance

Description

Checklist

Motivation

Scheduler overlap is currently supported, but when scheduler scheduling time exceeds model forward inference time, scheduling cannot be fully masked. For example, with the QWEN3-8B model, concurrency set to 16, input size 1024, output size 1024, scheduling takes 5.5ms, while model run + sampler only takes 5ms. Therefore, we need to speed up scheduler thread performance.

Related resources

Refer to this document profile-with-jax-profiler to profile the scheduler and attempt to modify the code for acceleration

No response

Contributor guide