key_padding_mask is not used in transformer decoder layer? · facebookresearch/fairseq#537

(1 comment) (0 reactions) (0 assignees)Python (6,224 forks)batch import

bughelp wanted

Repository metrics

Stars: (29,107 stars)
PR merge metrics: (No merged PRs in 30d)

Description

when reading the source code i found that key_padding_mask is not used when calculating self attention. There is no problem when the target is padded right by default because attn_mask could do the same thing. But how about left padding on the target?

Contributor guide

Research direction: Examine the transformer decoder layer source code to verify if key padding mask is ignored in self attention; then propose a patch to incorporate it, considering left padded targets.
Tech stack: python
Domain: backendmachine learning
Issue type: Bug
Difficulty: 3
Estimated time: 1-3 hours
Activity status: Active
Clarity: Needs investigation
Prerequisites: PythonPyTorchTransformer architecture
Newbie friendliness: 45

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.