scikit-learn/scikit-learn

Dictionary learning is slower with n_jobs > 1

Open

#4,769 opened on 2015年5月26日

GitHub で見る
 (10 comments) (1 reaction) (0 assignees)Python (66,084 stars) (27,020 forks)batch import
Performancehelp wantedmodule:decomposition

説明

Setting n_jobs > 1 in MiniBatchDictionaryLearning (and in function dictionary_learning_online) leads to worse performance.

Multi processing is handled in sklearn.decompositions, function dict_learning, l 249

    code_views = Parallel(n_jobs=n_jobs)(
        delayed(_sparse_encode)(
            X[this_slice], dictionary, gram, cov[:, this_slice], algorithm,
            regularization=regularization, copy_cov=copy_cov,
            init=init[this_slice] if init is not None else None,
            max_iter=max_iter)
        for this_slice in slices)

Minimal example : https://gist.github.com/arthurmensch/091d16c135f4a3ba5580

Output n_jobs = 1

Distorting image...
Extracting reference patches...
done in 0.05s.
Learning the dictionary...
done in 5.12s.
Extracting noisy patches... 
done in 0.02s.
Lasso LARS...
done in 10.24s.

Output n_jobs == 2

Distorting image...
Extracting reference patches...
done in 0.05s.
Learning the dictionary...
done in 78.98s.
Extracting noisy patches... 
done in 0.02s.
Lasso LARS...
done in 6.15s.

Output n_jobs == 4

Distorting image...
Extracting reference patches...
done in 0.05s.
Learning the dictionary...
done in 83.24s.
Extracting noisy patches... 
done in 0.02s.
Lasso LARS...
done in 3.82s.

We can see that transform function of MiniBatchDictionaryLearning (relying on sparse_encode function) benefits from multi-processing as expected.

Dictionary learning relies on successive calls of sparse_encode function : slowness may come from this.

コントリビューターガイド