[Feature]: Implement Concurrent Partial Prefills In V1 Engine · vllm-project/vllm#14003

(17 comments) (0 reactions) (0 assignees)Python (16,816 forks)batch import

feature requesthelp wantedunstale

Repository metrics

In V0, we support concurrent partial prefills to avoid TTFT latency with long requests. Implement it in V1

cc @WoosukKwon

No response

No response

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Research direction: Study the V0 implementation of concurrent partial prefills and understand the V1 engine structure to port the feature.
Tech stack: python
Domain: backendinfrastructure
Issue type: Feature
Difficulty: 4
Estimated time: Over 1 week
Activity status: Active
Clarity: Needs investigation
Prerequisites: PythonCUDA
Newbie friendliness: 15