facebookresearch/fairseq

key_padding_mask is not used in transformer decoder layer?

Open

#537 创建于 2019年2月27日

在 GitHub 查看
 (1 评论) (0 反应) (0 负责人)Python (29,107 star) (6,224 fork)batch import
bughelp wanted

描述

when reading the source code i found that key_padding_mask is not used when calculating self attention. There is no problem when the target is padded right by default because attn_mask could do the same thing. But how about left padding on the target?

贡献者指南