Métricas do repositório
- Stars
- (19.985 stars)
- Métricas de merge de PR
- (Mesclagem média 16d 3h) (13 fundiu PRs em 30d)
Description
🚀 The feature, motivation and pitch
I am unable to find the clean implementation of local multi-headed self-attention in pytorch geometric. I found three types of multi-head attention, one TransformerConv (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.TransformerConv.html#torch_geometric.nn.conv.TransformerConv). But this one calculates a linear combination of all features with different attention weights as opposed to dividing features into multiple heads and taking their linear combination: another RGATConv in the similar direction (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.RGATConv.html). And finally GPSConv (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.GPSConv.html) that does multi-head attention but is global.
Alternatives
I think it is nice to have the implementation of local self-attention with multiple heads where each head looks into a part of the feature dimension.
Additional context
No response