facebookresearch/fairseq
在 GitHub 查看key_padding_mask is not used in transformer decoder layer?
Open
#537 建立於 2019年2月27日
bughelp wanted
描述
when reading the source code i found that key_padding_mask is not used when calculating self attention. There is no problem when the target is padded right by default because attn_mask could do the same thing. But how about left padding on the target?