dmlc/dgl
View on GitHub[Bug] Segfault in new sampling pipeline when Debug mode is on
Open
#3,755 opened on Feb 19, 2022
help wanted
Description
After merging #3665 the following code crashes in when DGL is built with DEBUG mode:
import torch
import dgl
from ogb.nodeproppred import DglNodePropPredDataset
dataset = DglNodePropPredDataset('ogbn-products')
graph, labels = dataset[0]
graph.ndata['label'] = labels.squeeze()
graph.create_formats_()
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(1)
dataloader = dgl.dataloading.NodeDataLoader(
graph, torch.arange(graph.num_nodes()), sampler, device='cuda',
batch_size=4096, shuffle=False, drop_last=False, num_workers=1,
pin_memory=False, use_prefetch_thread=False, use_alternate_streams=False)
for it, _ in enumerate(dataloader):
print(it)
The worker crashes with segfault with the following backtrace:
#0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f09c4cc4408 in handler_SIGSEGV(int, siginfo_t*, void*) () from /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#2 <signal handler called>
#3 _int_malloc (av=av@entry=0x7f09c7eafc40 <main_arena>, bytes=bytes@entry=5765296) at malloc.c:3924
#4 0x00007f09c7b5967b in _int_memalign (av=0x7f09c7eafc40 <main_arena>, alignment=64, bytes=<optimized out>) at malloc.c:4683
#5 0x00007f09c7b5f0fa in _mid_memalign (address=<optimized out>, bytes=5765192, alignment=<optimized out>) at malloc.c:3324
#6 __posix_memalign (memptr=0x7ffccfe12fd0, alignment=<optimized out>, size=5765192) at malloc.c:5361
#7 0x00007f0967bffcde in c10::alloc_cpu(unsigned long) () from /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/lib/libc10.so
#8 0x00007f0967c01452 in c10::DefaultCPUAllocator::allocate(unsigned long) const () from /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/lib/libc10.so
#9 0x00007f09aeabf7e9 in at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKey, c10::ScalarType, c10::Device, c10::optional<c10::MemoryFormat>) () from /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#10 0x00007f09aeac0547 in at::detail::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) ()
from /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#11 0x00007f09aef02af9 in at::native::empty_cpu(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) ()
from /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#12 0x00007f09af5e617a in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_memory_format_empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) ()
from /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#13 0x00007f09af5c462e in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) () from /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#14 0x00007f09af2cca90 in at::_ops::empty_memory_format::call(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) ()
from /home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#15 0x00007f07b809d92a in at::empty(c10::ArrayRef<long>, c10::TensorOptions, c10::optional<c10::MemoryFormat>) () from /home/ubuntu/miniconda3/lib/python3.8/site-packages/dgl-0.8-py3.8-linux-x86_64.egg/dgl/tensoradapter/pytorch/libtensoradapter_pytorch_1.10.0.so
#16 0x00007f07b809f033 in torch::empty(c10::ArrayRef<long>, c10::TensorOptions, c10::optional<c10::MemoryFormat>) () from /home/ubuntu/miniconda3/lib/python3.8/site-packages/dgl-0.8-py3.8-linux-x86_64.egg/dgl/tensoradapter/pytorch/libtensoradapter_pytorch_1.10.0.so
#17 0x00007f07b809ba3a in TAempty () from /home/ubuntu/miniconda3/lib/python3.8/site-packages/dgl-0.8-py3.8-linux-x86_64.egg/dgl/tensoradapter/pytorch/libtensoradapter_pytorch_1.10.0.so
#18 0x00007f07cc6ad0e3 in dgl::runtime::TensorDispatcher::Empty (this=0x7f07cef559e0 <dgl::runtime::TensorDispatcher::Global()::inst>, shape=std::vector of length 1, capacity 1 = {...}, dtype=..., ctx=...) at /home/ubuntu/dgl/include/dgl/runtime/tensordispatch.h:77
#19 0x00007f07cc6aa9bf in dgl::runtime::NDArray::Empty (shape=std::vector of length 1, capacity 1 = {...}, dtype=..., ctx=...) at /home/ubuntu/dgl/src/runtime/ndarray.cc:214
#20 0x00007f07cc4e6f4a in dgl::aten::impl::CSRToCOO<(DLDeviceType)1, long> (csr=...) at /home/ubuntu/dgl/src/array/cpu/spmat_op_impl_csr.cc:291
#21 0x00007f07cc2537a7 in dgl::aten::CSRToCOO (csr=..., data_as_order=false) at /home/ubuntu/dgl/src/array/array.cc:469
#22 0x00007f07cc8760ca in dgl::UnitGraph::CSR::OutEdges (this=0x55609089d9f0, etype=0, vids=...) at /home/ubuntu/dgl/src/graph/unit_graph.cc:693
#23 0x00007f07cc8631c1 in dgl::UnitGraph::InEdges (this=0x556090499190, etype=0, vids=...) at /home/ubuntu/dgl/src/graph/unit_graph.cc:1017
#24 0x00007f07cc72250c in dgl::HeteroGraph::InEdges (this=0x556090896650, etype=0, vids=...) at /home/ubuntu/dgl/src/graph/./heterograph.h:135
#25 0x00007f07cc7b9691 in dgl::sampling::SampleNeighbors (hg=std::shared_ptr<dgl::BaseHeteroGraph> (use count 3, weak count 0) = {...}, nodes=std::vector of length 1, capacity 1 = {...}, fanouts=std::vector of length 1, capacity 1 = {...}, dir=dgl::EdgeDir::kIn,
prob=std::vector of length 1, capacity 1 = {...}, exclude_edges=std::vector of length 0, capacity 0, replace=false) at /home/ubuntu/dgl/src/graph/sampling/neighbor/neighbor.cc:101
...
Setting BUILD_TORCH=OFF still segfaults in the same call in 19th frame (NDArray::Empty).