dmlc/dgl
View on GitHubCalling `to_block()` results in duplicate node types -- breaking several parts of heterograph
Open
#2,986 opened on Jun 4, 2021
help wanted
Description
馃悰 Bug
Calling to_block() results in duplicate node types, breaking things like edge functions, as assigning features via ndata only operates on the srcnodes, and not the dstnodes.
At least, the implementation of DGLHeteroGraph.get_ntype_id() is broken by this:
def get_ntype_id(self, ntype):
...
ntid = self._srctypes_invmap.get(ntype, self._dsttypes_invmap.get(ntype, None))
As it will only ever choose the source node type, and not the destination node type. There may be other such things that are broken as well.
To Reproduce
import dgl
import torch as th
g = dgl.heterograph({('user','follows','user'): [(0,1), (1,0), (0,2)]})
print("graph ntypes = {}".format(g.ntypes))
b = dgl.to_block(g)
print("block ntypes = {}".format(b.ntypes))
Results in:
Using backend: pytorch
graph ntypes = ['user']
block ntypes = ['user', 'user']
Expected behavior
My understanding of how heterograph is implemented, is for it to work properly, we would expect one edge type in the block:
Using backend: pytorch
graph ntypes = ['user']
block ntypes = ['user']
Environment
- DGL Version (e.g., 1.0): master branch
- Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): pytorch 1.8
- OS (e.g., Linux): ubuntu 18.04
- How you installed DGL (
conda,pip, source): source - Build command you used (if compiling from source): cmake .. -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Debug
- Python version: 3.6
- CUDA/cuDNN version (if applicable): 11.3
- GPU models and configuration (e.g. V100): TitanV
- Any other relevant information:
Additional context
This is currently a blocker for edge property prediction on a homogeneous graph with sampling.