vllm-project/vllm-ascend

[RFC]: support sequence parallelism by pass

Open

#5 712 ouverte le 8 janv. 2026

Voir sur GitHub
 (1 commentaire) (2 réactions) (0 assignés)C++ (1 318 forks)github user discovery
RFChelp wanted

Métriques du dépôt

Stars
 (2 180 stars)
Métriques de merge PR
 (Merge moyen 5j 16h) (419 PRs mergées en 30 j)

Description

Motivation.

Flash Comm V1 (FC1) is a feature that is similiar to sequence parallelism. FC1 is implemented by custom op in vllm-ascend. However, it is not supported for VL models. When extending FC1 to VL models, we meet 2 problems: 1: The VL model lacks an embedding-layer reduce-scatter operation, resulting in redundant all-gather during the first step.

2: In Qwen3-VL, deepstack_input_embeds is added after computation at each layer, but the shape does not match. We must add chunk before layernorm.

Proposed Change.

Implement sequence parallelism by pass:

Feedback Period.

No response

CC List.

@wxsIcey @ApsarasX

Any Other Things.

No response

Guide contributeur