[Feature] Detailed Break down of time spend on Launching SGLang Diffusion · sgl-project/sglang#19087

(10 comments) (0 reactions) (1 assignee)Python (6,216 forks)auto 404

good first issue

Repository metrics

Stars: (28,442 stars)
PR merge metrics: (Avg merge 2d 1h) (1,000 merged PRs in 30d)

Description

Checklist

If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
Please use English. Otherwise, it will be closed.

Motivation

Diffusion and LLM have huge differences in compute characteristics. We want to have a detailed optimization of the launch time spent on SGLang Diffusion.

In this sense, to optimize the launch time, we should have a detailed breakdown of what is actually taking time when we launch our models. Please use Qwen-Image as an example, and try to break down the time spent. Then let's see whether we shall spend our time on optimize the launching time.

Related resources

No response

Contributor guide

Research direction: Profile the launch time of SGLang Diffusion using Qwen Image as an example. Break down the time into components such as model loading, weight initialization, compilation, and any other initialization steps. Identify bottlenecks and report findings.
Tech stack: python
Domain: backendmachine learningperformance
Issue type: Feature
Difficulty: 3
Estimated time: Half day
Activity status: Active
Clarity: Clear
Prerequisites: Python
Newbie friendliness: 60