[BUG] Implementation errors on `TransformerModel`. · unit8co/darts#672

(0 comments) (0 reactions) (0 assignees)Python (6,832 stars) (762 forks)batch import

buggood first issue

説明

I have looked into [Transformer](https://github.com/unit8co/darts/blame/master/darts/models/forecasting/transformer_model.py) and I have found some errors.

Frist,

In line 167, 170,

src = self.encoder(src) * math.sqrt(self.input_size)

tgt = self.encoder(tgt) * math.sqrt(self.input_size)

I don't think we have to multiply math.sqrt(self.input_size) to inputs (src or tgt). Because torch.nn.MultiheadAttention take cares this normalization.

Second,

In line 173 - 174,

        x = self.transformer(src=src,
                             tgt=tgt)

There is no tgt_mask for this prediction. In order to use teacher forcing at training stage, user must feed tgt_mask to forward function (specifically square_subsequent_mask defined below). Otherwise decoder inputs before time t can see future decoder inputs (e.g, t+1, t+2, ...) which doesn't exist at inference stage.

[docs]    @staticmethod
    def generate_square_subsequent_mask(sz: int) -> Tensor:
        r"""Generate a square mask for the sequence. The masked positions are filled with float('-inf').
            Unmasked positions are filled with float(0.0).
        """
        return torch.triu(torch.full((sz, sz), float('-inf')), diagonal=1)

I'm not sure these things are errors. But, in my opinion, it seems this is not correct.

Thank you!

コントリビューターガイド

技術スタック: pythonpytorch
領域: machine learning
Issue 種別: bug
難度: 3
推定時間: 1-3 hours
活動状況: fresh
明確さ: clear
前提条件: PyTorchTransformer architectureDarts library basics
初心者向け度: 70
調査方針: The issue identifies two potential bugs in darts/models/forecasting/transformer model.py. First, lines 167 and 170 multiply encoder outputs by sqrt(input size) which may be redundant. Second, lines 173-174 pass src and tgt to self.transformer without tgt mask, which could cause teacher forcing to leak future information. The contributor should verify these concerns and propose a fix, potentially removing the multiplication and adding tgt mask using generate square subsequent mask.