CPU omp thread affinity becomes imbalanced after backpass · dmlc/dgl#1428

(4 comments) (0 reactions) (0 assignees)Python (2,928 forks)batch import

help wantedtopic: system performance

Repository metrics

Stars: (12,665 stars)
PR merge metrics: (No merged PRs in 30d)

Description

🐛 Bug

I am running graphsage app in dgl+pytorch setting. In minigun code which spawns omp threads to execute e.g copyReduce, I see that multiple omp threads are getting executed on a cpu - somehow thread affinity is getting messed up. This happens only after first backpass which is creating extra threads in addition to the thread created in forward pass. For example, if I set OMP_NUM_THREADS=10 (for 10 cpus), then after the first fwd and bck pass I see there are 20 os threads. Then in the next fwd pass a subset of 10 threads out of the pool of 20 threads are used, leading to the affinity problem.

Environment

DGL Version (e.g., 1.0): latest commit
Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): pytorch
OS (e.g., Linux): Centos
How you installed DGL (conda, pip, source): compiled and installed from src code
Build command you used (if compiling from source): make -j
Python version: 3.6/3.7
CUDA/cuDNN version (if applicable):
GPU models and configuration (e.g. V100):
Any other relevant information: Intel CPUs

Additional context

Contributor guide

Research direction: Investigate the minigun OpenMP thread spawning logic to understand why additional threads are created after the backward pass and how thread affinity is affected.
Tech stack: pythonpytorchc
Domain: machine learningbackend
Issue type: Bug
Difficulty: 3
Estimated time: 1-3 hours
Activity status: Active
Clarity: Mostly clear
Prerequisites: C++OpenMP
Newbie friendliness: 40