sgl-project/sglang-jax

[Feature] Speed Up Scheduler Thread Performance

Open

Aperta il 30 ott 2025

Vedi su GitHub
 (1 commento) (0 reazioni) (0 assegnatari)Python (276 star) (101 fork)auto 404
good first issueperformance

Descrizione

Checklist

Motivation

Scheduler overlap is currently supported, but when scheduler scheduling time exceeds model forward inference time, scheduling cannot be fully masked. For example, with the QWEN3-8B model, concurrency set to 16, input size 1024, output size 1024, scheduling takes 5.5ms, while model run + sampler only takes 5ms. Therefore, we need to speed up scheduler thread performance.

Related resources

Refer to this document profile-with-jax-profiler to profile the scheduler and attempt to modify the code for acceleration

No response

Guida contributor