[Feature]: Edge-aware & structure-preserving augmentations for VLM/VLA training pipelines
#3,676 建立於 2026年3月27日
描述
🚀 Feature Description
hello , In many VLM/VLA settings (eg- segmentation-assisted captioning), structural cues such as edges and boundaries play a critical role in aligning visual tokens with language or actions.
However, current augmentations (blur, noise, color jitter, etc.) often:
Distort object boundaries Degrade fine-grained spatial structure Introduce mismatch between visual features and downstream alignment tasks
📂 Feature Category
VLM/VLA Models (Vision Language Models/Agents) - Priority
💡 Motivation
While working on segmentation and satellite imagery, I noticed that standard augmentations (e.g., blur, noise, etc.) can sometimes degrade important edge details, which negatively impacts model performance. Also, This becomes especially important in:
VLM grounding tasks (region-text alignment depends on boundaries) VLA / robotics pipelines (edges define actionable regions) Segmentation-assisted multimodal models Remote sensing / medical VLMs
Standard augmentations are structure-agnostic, which can hurt:
cross-modal alignment quality spatial reasoning performance robustness in downstream tasks
💭 Proposed Solution
Introduce a structure-aware / edge-aware augmentation module designed for modern multimodal pipelines.
Key idea:
Use edge/structure priors to modulate augmentation strength.
Suggested API kornia.augmentation.EdgeAwareAugmentation( base_aug, # any existing Kornia augmentation edge_detector="sobel", # or "canny" mode="soft", # soft weighting or hard masking edge_weight=0.3, # strength reduction near edges detach_edges=True # optional for efficiency )
🔄 Alternatives Considered
No response
🎯 Use Cases
No response
📝 Additional Context
No response
🤝 Contribution Intent
- I plan to submit a PR to implement this feature
- I'm requesting this feature but not planning to implement it