pyg-team/pytorch_geometric

[Roadmap] GraphBolt Integration 🚀

Open

#9,349 创建于 2024年5月22日

在 GitHub 查看
 (0 评论) (3 反应) (1 负责人)Python (19,985 star) (3,514 fork)batch import
1 - Priority P1featurehelp wantedroadmap

描述

🚀 The feature, motivation and pitch

GraphBolt is a new GNN-based dataloading framework, which is GNN-library agnostic. In particular, it provides feature store and sampling routines to allow for scalable data loading across CPU/GPU devices. The GraphBolt repository also contains PyG examples.

This issue tracks progress towards PyG in-house GraphBolt support, i.e. via providing a backend option in NeighborLoader and LinkNeighborLoader classes.

FeatureStore

  • Implement torch_geometric.data.CUDAFeatureStore by maintaining an internal graphbolt.TorchBasedFeatureStore with graphbolt.GPUCachedFeature features.
  • Implement torch_geometric.data.OnDiskFeatureStore by maintaining an internal graphbolt.TorchBasedFeatureStore with graphbolt.OnDiskFeature features. (TBD)

Samplers

  • Implement a torch_geometric.sampler.GraphBoltNeighborSampler(NeighborSampler) implementation that uses GraphBolt as the backend for performing sample_from_nodes and sample_from_links.
  • Support temporal sampling in GraphBoltNeighborSampler

Data Loaders

  • Implement a backend option in NeighborLoader and LinkNeighborLoader that creates NeighborSampler instances based on the chosen backend (backend="default"->sampler.NeighborSampler, backend="graphbolt"->sampler.GraphBolt.NeighborSampler
  • Test GPU-based sampling via backend="graphbolt"
  • Integrate graphbolt.ItemSampler and datepipe.fetch_feature routines into NeighborLoader and LinkNeighborLoader in case the chosen backend is set to "graphbolt"

Examples

  • Provide an e2e example for GPU-based sampling via backend="graphbolt"

贡献者指南