[Tracking] Improve Multimodal CI coverage · sgl-project/sglang#8496

2025-07-29T04:39:17.000Z

### Checklist - [ ] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed. - [ ] 2. Please use English, otherwise it will be closed. ### Motivation 1. All existing Multi-modal CI operations are executed on a single GPU, without considering the scenario of Tensor Parallelism (TP). There is a requirement to introduce test cases for TP-2/4 configurations. 2. The payload for VLM CI is relatively low. Stress tests are necessary, and simultaneously, it is crucial to investigate whether there are any memory leaks. 3. At present, the majority of the VLM CI is only to vision. It is essential to expand its scope to audio. 4. Welcome to add more. ### Related resources The test is under `test/srt/test_vision_openai_server_x` Related PRs: - https://github.com/sgl-project/sglang/pull/8428 - https://github.com/sgl-project/sglang/pull/7519

(3 comments) (5 reactions) (0 assignees)Python (6,216 forks)auto 404

Multi-modalcigood first issueperformance

Repository metrics

Stars: (28,442 stars)
PR merge metrics: (Avg merge 2d 1h) (1,000 merged PRs in 30d)

Description

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

All existing Multi-modal CI operations are executed on a single GPU, without considering the scenario of Tensor Parallelism (TP). There is a requirement to introduce test cases for TP-2/4 configurations.
The payload for VLM CI is relatively low. Stress tests are necessary, and simultaneously, it is crucial to investigate whether there are any memory leaks.
At present, the majority of the VLM CI is only to vision. It is essential to expand its scope to audio.
Welcome to add more.

Related resources

The test is under test/srt/test_vision_openai_server_x

Related PRs:

Contributor guide

Research direction: Look at existing VLM CI tests under test/srt/test vision openai server x, identify gaps for TP configurations, stress tests, audio tests, and propose or implement test cases. Refer to related PRs #8428 and #7519.
Tech stack: python
Domain: devops
Issue type: Test
Difficulty: 3
Estimated time: Over 1 week
Activity status: Active
Clarity: Clear
Prerequisites: Git
Newbie friendliness: 65