pyg-team/pytorch_geometric

[Roadmap] GraphBolt Integration 馃殌

Open

#9,349 opened on May 22, 2024

View on GitHub
聽(0 comments)聽(3 reactions)聽(1 assignee)Python聽(19,985 stars)聽(3,514 forks)batch import
1 - Priority P1featurehelp wantedroadmap

Description

馃殌 The feature, motivation and pitch

GraphBolt is a new GNN-based dataloading framework, which is GNN-library agnostic. In particular, it provides feature store and sampling routines to allow for scalable data loading across CPU/GPU devices. The GraphBolt repository also contains PyG examples.

This issue tracks progress towards PyG in-house GraphBolt support, i.e. via providing a backend option in NeighborLoader and LinkNeighborLoader classes.

FeatureStore

  • Implement torch_geometric.data.CUDAFeatureStore by maintaining an internal graphbolt.TorchBasedFeatureStore with graphbolt.GPUCachedFeature features.
  • Implement torch_geometric.data.OnDiskFeatureStore by maintaining an internal graphbolt.TorchBasedFeatureStore with graphbolt.OnDiskFeature features. (TBD)

Samplers

  • Implement a torch_geometric.sampler.GraphBoltNeighborSampler(NeighborSampler) implementation that uses GraphBolt as the backend for performing sample_from_nodes and sample_from_links.
  • Support temporal sampling in GraphBoltNeighborSampler

Data Loaders

  • Implement a backend option in NeighborLoader and LinkNeighborLoader that creates NeighborSampler instances based on the chosen backend (backend="default"->sampler.NeighborSampler, backend="graphbolt"->sampler.GraphBolt.NeighborSampler
  • Test GPU-based sampling via backend="graphbolt"
  • Integrate graphbolt.ItemSampler and datepipe.fetch_feature routines into NeighborLoader and LinkNeighborLoader in case the chosen backend is set to "graphbolt"

Examples

  • Provide an e2e example for GPU-based sampling via backend="graphbolt"

Contributor guide