Dictionary learning is slower with n_jobs > 1 · scikit-learn/scikit-learn#4769

(10 comments) (1 reaction) (0 assignees)Python (27,020 forks)batch import

Performancehelp wantedmodule:decomposition

Repository metrics

Stars: (66,084 stars)
PR merge metrics: (平均マージ 10d) (30d で 90 merged PRs)

説明

Setting n_jobs > 1 in MiniBatchDictionaryLearning (and in function dictionary_learning_online) leads to worse performance.

Multi processing is handled in sklearn.decompositions, function dict_learning, l 249

    code_views = Parallel(n_jobs=n_jobs)(
        delayed(_sparse_encode)(
            X[this_slice], dictionary, gram, cov[:, this_slice], algorithm,
            regularization=regularization, copy_cov=copy_cov,
            init=init[this_slice] if init is not None else None,
            max_iter=max_iter)
        for this_slice in slices)

Minimal example : https://gist.github.com/arthurmensch/091d16c135f4a3ba5580

Output n_jobs = 1

Distorting image...
Extracting reference patches...
done in 0.05s.
Learning the dictionary...
done in 5.12s.
Extracting noisy patches... 
done in 0.02s.
Lasso LARS...
done in 10.24s.

Output n_jobs == 2

Distorting image...
Extracting reference patches...
done in 0.05s.
Learning the dictionary...
done in 78.98s.
Extracting noisy patches... 
done in 0.02s.
Lasso LARS...
done in 6.15s.

Output n_jobs == 4

Distorting image...
Extracting reference patches...
done in 0.05s.
Learning the dictionary...
done in 83.24s.
Extracting noisy patches... 
done in 0.02s.
Lasso LARS...
done in 3.82s.

We can see that transform function of MiniBatchDictionaryLearning (relying on sparse_encode function) benefits from multi-processing as expected.

Dictionary learning relies on successive calls of sparse_encode function : slowness may come from this.

コントリビューターガイド

調査方針: dict learning 内の Parallel のオーバーヘッドを調査し、sparse encode の繰り返し呼び出しが過剰なデータ転送やピクル化を引き起こしていないか確認してください。
技術スタック: python
領域: machine learningperformance
Issue 種別: バグ
難度: 2
推定時間: 1-3時間
活動状況: 古い
明確さ: 明確
前提条件: Pythonscikit learn
初心者向け度: 60

Repository metrics

説明

コントリビューターガイド

新着 Easy issues をメールで受け取る。