1 - Priority P1featurehelp wantedroadmap
Description
馃殌 The feature, motivation and pitch
GraphBolt is a new GNN-based dataloading framework, which is GNN-library agnostic. In particular, it provides feature store and sampling routines to allow for scalable data loading across CPU/GPU devices. The GraphBolt repository also contains PyG examples.
This issue tracks progress towards PyG in-house GraphBolt support, i.e. via providing a backend option in NeighborLoader and LinkNeighborLoader classes.
FeatureStore
- Implement
torch_geometric.data.CUDAFeatureStoreby maintaining an internalgraphbolt.TorchBasedFeatureStorewithgraphbolt.GPUCachedFeaturefeatures. - Implement
torch_geometric.data.OnDiskFeatureStoreby maintaining an internalgraphbolt.TorchBasedFeatureStorewithgraphbolt.OnDiskFeaturefeatures. (TBD)
Samplers
- Implement a
torch_geometric.sampler.GraphBoltNeighborSampler(NeighborSampler)implementation that uses GraphBolt as the backend for performingsample_from_nodesandsample_from_links. - Support temporal sampling in
GraphBoltNeighborSampler
Data Loaders
- Implement a
backendoption inNeighborLoaderandLinkNeighborLoaderthat createsNeighborSamplerinstances based on the chosen backend (backend="default"->sampler.NeighborSampler,backend="graphbolt"->sampler.GraphBolt.NeighborSampler - Test GPU-based sampling via
backend="graphbolt" - Integrate
graphbolt.ItemSampleranddatepipe.fetch_featureroutines intoNeighborLoaderandLinkNeighborLoaderin case the chosen backend is set to"graphbolt"
Examples
- Provide an e2e example for GPU-based sampling via
backend="graphbolt"