good first issueperformance
Descrizione
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sgl-jax/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
Scheduler overlap is currently supported, but when scheduler scheduling time exceeds model forward inference time, scheduling cannot be fully masked. For example, with the QWEN3-8B model, concurrency set to 16, input size 1024, output size 1024, scheduling takes 5.5ms, while model run + sampler only takes 5ms. Therefore, we need to speed up scheduler thread performance.
Related resources
Refer to this document profile-with-jax-profiler to profile the scheduler and attempt to modify the code for acceleration
No response