PositionalEmbedding · codertimo/BERT-pytorch#53

(3 comments) (4 reactions) (0 assignees)Python (5,757 stars) (1,252 forks)batch import

good first issue

Description

The position embedding in the BERT is not the same as in the transformer. Why not use the form in bert?

Tech stack: pythonpytorch
Domain: machine learning
Issue type: research
Difficulty: 3
Estimated time: 1-3 hours
Activity status: stale
Clarity: mostly clear
Prerequisites: understanding of Transformer architecturefamiliarity with BERT
Newbie friendliness: 45
Research direction: Investigate the difference between BERT's learned positional embeddings and the original Transformer's sinusoidal positional encoding. The issue asks why BERT uses learned embeddings instead of the sinusoidal form. Review the BERT paper (Devlin et al., 2018) and the original Transformer paper (Vaswani et al., 2017) for rationale. Check the current code in the repository (e.g., model/bert.py) to see how positional embeddings are implemented. Provide a concise explanation with references to the relevant sections and academic papers.