Local multi-headed self-attention · pyg-team/pytorch_geometric#8972

Repository metrics

I am unable to find the clean implementation of local multi-headed self-attention in pytorch geometric. I found three types of multi-head attention, one TransformerConv (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.TransformerConv.html#torch_geometric.nn.conv.TransformerConv). But this one calculates a linear combination of all features with different attention weights as opposed to dividing features into multiple heads and taking their linear combination: another RGATConv in the similar direction (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.RGATConv.html). And finally GPSConv (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.GPSConv.html) that does multi-head attention but is global.

I think it is nice to have the implementation of local self-attention with multiple heads where each head looks into a part of the feature dimension.

No response

Research direction: Implement a local multi headed self attention mechanism that splits feature dimensions across heads, similar to TransformerConv but with per head feature subspaces. Study existing implementations like TransformerConv, RGATConv, and GPSConv in PyTorch Geometric to understand patterns and extend them.
Tech stack: pythonpytorch
Domain: machine learning
Issue type: Feature
Difficulty: 3
Estimated time: 1-2 days
Activity status: Active
Clarity: Mostly clear
Prerequisites: PythonPyTorchGraph Neural Networks basics
Newbie friendliness: 30