pyg-team/pytorch_geometric

[Roadmap] GraphBolt Integration 🚀

Open

#9.349 geöffnet am 22. Mai 2024

Auf GitHub ansehen
 (0 Kommentare) (3 Reaktionen) (1 zugewiesene Person)Python (3.514 Forks)batch import
1 - Priority P1featurehelp wantedroadmap

Repository-Metriken

Stars
 (19.985 Stars)
PR-Merge-Metriken
 (Durchschn. Merge 16T 3h) (13 gemergte PRs in 30 T)

Beschreibung

🚀 The feature, motivation and pitch

GraphBolt is a new GNN-based dataloading framework, which is GNN-library agnostic. In particular, it provides feature store and sampling routines to allow for scalable data loading across CPU/GPU devices. The GraphBolt repository also contains PyG examples.

This issue tracks progress towards PyG in-house GraphBolt support, i.e. via providing a backend option in NeighborLoader and LinkNeighborLoader classes.

FeatureStore

  • Implement torch_geometric.data.CUDAFeatureStore by maintaining an internal graphbolt.TorchBasedFeatureStore with graphbolt.GPUCachedFeature features.
  • Implement torch_geometric.data.OnDiskFeatureStore by maintaining an internal graphbolt.TorchBasedFeatureStore with graphbolt.OnDiskFeature features. (TBD)

Samplers

  • Implement a torch_geometric.sampler.GraphBoltNeighborSampler(NeighborSampler) implementation that uses GraphBolt as the backend for performing sample_from_nodes and sample_from_links.
  • Support temporal sampling in GraphBoltNeighborSampler

Data Loaders

  • Implement a backend option in NeighborLoader and LinkNeighborLoader that creates NeighborSampler instances based on the chosen backend (backend="default"->sampler.NeighborSampler, backend="graphbolt"->sampler.GraphBolt.NeighborSampler
  • Test GPU-based sampling via backend="graphbolt"
  • Integrate graphbolt.ItemSampler and datepipe.fetch_feature routines into NeighborLoader and LinkNeighborLoader in case the chosen backend is set to "graphbolt"

Examples

  • Provide an e2e example for GPU-based sampling via backend="graphbolt"

Contributor Guide