facebookresearch/fairseq
View on GitHubkey_padding_mask is not used in transformer decoder layer?
Open
#537 opened on Feb 27, 2019
bughelp wanted
Description
when reading the source code i found that key_padding_mask is not used when calculating self attention. There is no problem when the target is padded right by default because attn_mask could do the same thing. But how about left padding on the target?