[Feature]: Edge-aware & structure-preserving augmentations for VLM/VLA training pipelines · kornia/kornia#3676

(1 留言) (0 反應) (0 負責人)Python (8,677 star) (892 fork)batch import

help wantedtriage

描述

🚀 Feature Description

hello , In many VLM/VLA settings (eg- segmentation-assisted captioning), structural cues such as edges and boundaries play a critical role in aligning visual tokens with language or actions.

However, current augmentations (blur, noise, color jitter, etc.) often:

Distort object boundaries Degrade fine-grained spatial structure Introduce mismatch between visual features and downstream alignment tasks

📂 Feature Category

VLM/VLA Models (Vision Language Models/Agents) - Priority

💡 Motivation

While working on segmentation and satellite imagery, I noticed that standard augmentations (e.g., blur, noise, etc.) can sometimes degrade important edge details, which negatively impacts model performance. Also, This becomes especially important in:

VLM grounding tasks (region-text alignment depends on boundaries) VLA / robotics pipelines (edges define actionable regions) Segmentation-assisted multimodal models Remote sensing / medical VLMs

Standard augmentations are structure-agnostic, which can hurt:

cross-modal alignment quality spatial reasoning performance robustness in downstream tasks

💭 Proposed Solution

Introduce a structure-aware / edge-aware augmentation module designed for modern multimodal pipelines.

Key idea:

Use edge/structure priors to modulate augmentation strength.

Suggested API kornia.augmentation.EdgeAwareAugmentation( base_aug, # any existing Kornia augmentation edge_detector="sobel", # or "canny" mode="soft", # soft weighting or hard masking edge_weight=0.3, # strength reduction near edges detach_edges=True # optional for efficiency )

🔄 Alternatives Considered

No response

🎯 Use Cases

No response

📝 Additional Context

No response

🤝 Contribution Intent

I plan to submit a PR to implement this feature
I'm requesting this feature but not planning to implement it

貢獻者指南

技術棧: pythonpytorch
領域: machine learningdata
議題類型: feature
難度: 3
預計時間: 3-5 days
活動狀態: fresh
清晰度: clear
前置要求: PythonPyTorchkornia basicscomputer vision fundamentals
新手友善度: 50
研究方向: Examine the existing augmentation modules in kornia.augmentation to understand the API pattern. Implement an EdgeAwareAugmentation class that wraps a base augmentation and applies edge detection (e.g., Sobel or Canny) to modulate the augmentation strength near edges. Consider using PyTorch's torchvision or kornia's own edge detection filters. Ensure the module integrates seamlessly with the existing augmentation pipeline. The contributor has indicated intent to submit a PR, so coordinate with maintainers for alignment.