kornia/kornia

[Feature]: Edge-aware & structure-preserving augmentations for VLM/VLA training pipelines

Open

#3,676 建立於 2026年3月27日

在 GitHub 查看
 (1 留言) (0 反應) (0 負責人)Python (8,677 star) (892 fork)batch import
help wantedtriage

描述

🚀 Feature Description

hello , In many VLM/VLA settings (eg- segmentation-assisted captioning), structural cues such as edges and boundaries play a critical role in aligning visual tokens with language or actions.

However, current augmentations (blur, noise, color jitter, etc.) often:

Distort object boundaries Degrade fine-grained spatial structure Introduce mismatch between visual features and downstream alignment tasks

📂 Feature Category

VLM/VLA Models (Vision Language Models/Agents) - Priority

💡 Motivation

While working on segmentation and satellite imagery, I noticed that standard augmentations (e.g., blur, noise, etc.) can sometimes degrade important edge details, which negatively impacts model performance. Also, This becomes especially important in:

VLM grounding tasks (region-text alignment depends on boundaries) VLA / robotics pipelines (edges define actionable regions) Segmentation-assisted multimodal models Remote sensing / medical VLMs

Standard augmentations are structure-agnostic, which can hurt:

cross-modal alignment quality spatial reasoning performance robustness in downstream tasks

💭 Proposed Solution

Introduce a structure-aware / edge-aware augmentation module designed for modern multimodal pipelines.

Key idea:

Use edge/structure priors to modulate augmentation strength.

Suggested API kornia.augmentation.EdgeAwareAugmentation( base_aug, # any existing Kornia augmentation edge_detector="sobel", # or "canny" mode="soft", # soft weighting or hard masking edge_weight=0.3, # strength reduction near edges detach_edges=True # optional for efficiency )

🔄 Alternatives Considered

No response

🎯 Use Cases

No response

📝 Additional Context

No response

🤝 Contribution Intent

  • I plan to submit a PR to implement this feature
  • I'm requesting this feature but not planning to implement it

貢獻者指南