facebookresearch/fairseq

key_padding_mask is not used in transformer decoder layer?

Open

#537 opened on Feb 27, 2019

View on GitHub
 (1 comment) (0 reactions) (0 assignees)Python (29,107 stars) (6,224 forks)batch import
bughelp wanted

Description

when reading the source code i found that key_padding_mask is not used when calculating self attention. There is no problem when the target is padded right by default because attn_mask could do the same thing. But how about left padding on the target?

Contributor guide

key_padding_mask is not used in transformer decoder layer? · facebookresearch/fairseq#537 | Good First Issue