[Feature]: Expand test coverage for Qwen2VL vision-language model · kornia/kornia#3556

(7 留言) (0 反應) (2 負責人)Python (8,677 star) (892 fork)batch import

help wantedtriage

描述

🚀 Feature Description

Expand test coverage for the Qwen2VL vision-language model, which currently has minimal testing with only 33 test lines for 205 lines of implementation (0.16 ratio), consisting of smoke tests.

📂 Feature Category

VLM/VLA Models (Vision Language Models/Agents) - Priority

💡 Motivation

Current situation:

Qwen2VL has only 33 test lines (205 lines of implementation)
Current tests consist of only a basic smoke test
Missing critical tests:
- No gradient checks (gradcheck)
- No component-level tests
- No torch.compile/dynamo tests
- No integration tests with actual vision-language tasks
- No pretrained weight loading verification (if applicable)
- No batch consistency tests
- No exception handling tests

Why expanded testing is needed:

Verify vision-language model functionality beyond basic instantiation
Ensure compatibility with PyTorch optimization features (torch.compile)
Test gradient flow through vision and language components
Prevent regressions in model architecture changes
Provide comprehensive usage examples
Match testing standards of other VLMs (e.g., KimiVL: 0.64 ratio)

💭 Proposed Solution

Migrate to BaseTester pattern:

✅ Smoke tests: Multiple configurations, batch sizes, input formats
✅ Exception tests: Invalid inputs, edge cases, error handling
✅ Cardinality tests: Output shape verification for various inputs
✅ Component tests: Individual module testing (vision encoder, language decoder, etc.)
✅ Feature tests: Vision-language alignment, attention mechanisms
✅ Gradient checks: Verify backpropagation correctness using gradcheck
✅ Torch.compile compatibility: Test with torch_optimizer fixture
✅ Integration tests: End-to-end vision-language tasks
✅ Batch consistency: Verify batch vs. individual processing
✅ Pretrained weights: Verify weight loading if pretrained models available

Additional Documentation

Jupyter notebook demonstrating:

Model instantiation and configuration
Vision-language inference examples
Image-text alignment capabilities
Comparison with official Qwen2VL implementation
Performance benchmarks
Integration with Hugging Face models (if applicable)

🔄 Alternatives Considered

No response

🎯 Use Cases

For developers:

Ensure vision-language model works correctly after changes
Test gradient flow through model components
Verify compiler optimization compatibility
Understand model architecture through tests

For users:

Learn proper usage patterns for vision-language tasks
See practical examples with images and text
Understand model capabilities and limitations
Get copy-paste examples for inference

For maintainers:

Maintain code quality standards across VLMs
Catch regressions before deployment
Ensure consistency with other VLM implementations

📝 Additional Context

Gap identified: Qwen2VL has the lowest test coverage (0.16) among VLMs in the repository.

🤝 Contribution Intent

I plan to submit a PR to implement this feature
I'm requesting this feature but not planning to implement it

貢獻者指南

技術棧: pythonpytorch
領域: machine learningtesting
議題類型: test
難度: 3
預計時間: 3-5 days
活動狀態: blocked
清晰度: clear
前置要求: PythonPyTorchpytestVLM basics
新手友善度: 65
研究方向: The issue proposes expanding test coverage for Qwen2VL in the kornia library. The existing test file is likely at tests/... for Qwen2VL. The tests should follow the BaseTester pattern used elsewhere. Check comments and existing test structures in the repository to understand the pattern. Specific tests include gradient checks, torch.compile, and integration tests. No linked PRs or maintainer questions yet.