facebookresearch/fairseq
GitHub で見るkey_padding_mask is not used in transformer decoder layer?
Open
#537 opened on 2019年2月27日
bughelp wanted
説明
when reading the source code i found that key_padding_mask is not used when calculating self attention. There is no problem when the target is padded right by default because attn_mask could do the same thing. But how about left padding on the target?