3 comments (3 comments)4 reactions (4 reactions)0 assignees (0 assignees)Python5,757 stars (5,757 stars)1,252 forks (1,252 forks)batch import
good first issue
Description
The position embedding in the BERT is not the same as in the transformer. Why not use the form in bert?
Contributor guide
- Tech stack
- pythonpytorch
- Domain
- machine learning
- Issue type
- research
- DifficultyEstimated implementation difficulty for a new contributor, from 1 for very small changes to 5 for expert-level work.
- 3
- Estimated timeA rough time range for an experienced contributor to investigate, implement, test, and prepare a pull request.
- 1-3 hours
- Activity statusHow available the issue appears right now: fresh, active, stale, blocked, or waiting on maintainer input.
- stale
- ClarityHow clearly the issue explains the expected change, acceptance criteria, and next step.
- mostly clear
- Prerequisites
- understanding of Transformer architecturefamiliarity with BERT
- Newbie friendlinessA 1-100 score estimating how approachable this issue is for first-time contributors.
- 45
- Research direction
- Investigate the difference between BERT's learned positional embeddings and the original Transformer's sinusoidal positional encoding. The issue asks why BERT uses learned embeddings instead of the sinusoidal form. Review the BERT paper (Devlin et al., 2018) and the original Transformer paper (Vaswani et al., 2017) for rationale. Check the current code in the repository (e.g., model/bert.py) to see how positional embeddings are implemented. Provide a concise explanation with references to the relevant sections and academic papers.