vllm-project/vllm
View on GitHub[Feature]: Run performance benchmarks for multi-modal models in CI
Open
#16,353 opened on Apr 9, 2025
feature requesthelp wantedkeep-openmulti-modality
Description
🚀 The feature, motivation and pitch
We currently only have benchmarks for text-only models such as Llama. With the increasing importance of multi-modality and related optimizations such as processor cache, we should add performance benchmarks for multi-modal models to avoid regressions (e.g. memory leaks, slow batching).
We can measure the peak memory usage based on this code:
import resource
max_self_usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / (1 << 20)
max_children_usage = resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss / (1 << 20)
print(f"Peak memory usage: {max_self_usage} (self) + {max_children_usage} (children) GiB")
Alternatives
No response
Additional context
cc @mgoin @ywang96
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.