key_padding_mask is not used in transformer decoder layer? · facebookresearch/fairseq#537

(1 评论) (0 反应) (0 负责人)Python (6,224 fork)batch import

bughelp wanted

仓库指标

Star: (29,107 star)
PR 合并指标: (30 天内没有已合并 PR)

描述

when reading the source code i found that key_padding_mask is not used when calculating self attention. There is no problem when the target is padded right by default because attn_mask could do the same thing. But how about left padding on the target?

贡献者指南

研究方向: 检查transformer解码器层的源代码，确认key padding mask是否在自注意力中被忽略；然后提出补丁以纳入它，考虑左填充目标。
技术栈: python
领域: backendmachine learning
议题类型: 缺陷
难度: 3
预计时间: 1-3 小时
活动状态: 活跃
清晰度: 需要先调研
前置要求: PythonPyTorchTransformer architecture
新手友好度: 45

仓库指标

描述

贡献者指南

每天在邮箱收到新鲜 Easy issues。