key_padding_mask is not used in transformer decoder layer? · facebookresearch/fairseq#537

(1 留言) (0 反應) (0 負責人)Python (29,107 star) (6,224 fork)batch import

bughelp wanted

描述

when reading the source code i found that key_padding_mask is not used when calculating self attention. There is no problem when the target is padded right by default because attn_mask could do the same thing. But how about left padding on the target?

貢獻者指南

技術棧: pythonpytorch
領域: machine learning
議題類型: bug
難度: 4
預計時間: half day
活動狀態: stale
清晰度: clear
前置要求: PythonPyTorchTransformer attentionfairseq internals
新手友善度: 40
研究方向: Investigate the fairseq transformer decoder layer implementation, likely in `fairseq/models/transformer.py`. Look for the self attention computation and how `key padding mask` is handled. Compare with the encoder layer or other transformer implementations to identify where the mask should be applied. Consider testing with left padded targets to confirm the issue. Propose a fix by modifying the attention function to incorporate `key padding mask`.