pyg-team/pytorch_geometric

[Roadmap] GraphBolt Integration 🚀

Open

#9349 aperta il 22 mag 2024

Vedi su GitHub
 (0 commenti) (3 reazioni) (1 assegnatario)Python (3514 fork)batch import
1 - Priority P1featurehelp wantedroadmap

Metriche repository

Star
 (19.985 star)
Metriche merge PR
 (Merge medio 16g 3h) (13 PR mergiate in 30 g)

Descrizione

🚀 The feature, motivation and pitch

GraphBolt is a new GNN-based dataloading framework, which is GNN-library agnostic. In particular, it provides feature store and sampling routines to allow for scalable data loading across CPU/GPU devices. The GraphBolt repository also contains PyG examples.

This issue tracks progress towards PyG in-house GraphBolt support, i.e. via providing a backend option in NeighborLoader and LinkNeighborLoader classes.

FeatureStore

  • Implement torch_geometric.data.CUDAFeatureStore by maintaining an internal graphbolt.TorchBasedFeatureStore with graphbolt.GPUCachedFeature features.
  • Implement torch_geometric.data.OnDiskFeatureStore by maintaining an internal graphbolt.TorchBasedFeatureStore with graphbolt.OnDiskFeature features. (TBD)

Samplers

  • Implement a torch_geometric.sampler.GraphBoltNeighborSampler(NeighborSampler) implementation that uses GraphBolt as the backend for performing sample_from_nodes and sample_from_links.
  • Support temporal sampling in GraphBoltNeighborSampler

Data Loaders

  • Implement a backend option in NeighborLoader and LinkNeighborLoader that creates NeighborSampler instances based on the chosen backend (backend="default"->sampler.NeighborSampler, backend="graphbolt"->sampler.GraphBolt.NeighborSampler
  • Test GPU-based sampling via backend="graphbolt"
  • Integrate graphbolt.ItemSampler and datepipe.fetch_feature routines into NeighborLoader and LinkNeighborLoader in case the chosen backend is set to "graphbolt"

Examples

  • Provide an e2e example for GPU-based sampling via backend="graphbolt"

Guida contributor