Bug in bfgs gradient computation of MLPRegressor with multiple output neurons · scikit-learn/scikit-learn#8349

(2 评论) (0 反应) (0 负责人)Python (66,084 star) (27,020 fork)batch import

Bughelp wantedmodule:neural_network

描述

When implementing a special Neural Network based on MLPRegressor, I found the following problem when using bfgs training and multiple output neurons (I did not look into the other training methods):

The 'squared_loss' implementation uses np.mean to compute the overall loss. Thus, the method divides by the number of samples and the number of output neurons/features included in the dimensions of y_true - y_pred.
The gradient computations do not include the number of output neurons. Gradients are only divided by the number of samples (_compute_loss_grad) Overall, this leads to the fact, that the gradient has a wrong scaling by the number of output neurons. As the search direction is still alright, this does not cause too much pain. Still, it should be fixed.

In case this is not clear, I can see that I create a minimal example.

Cheers!

Versions

import platform; print(platform.platform()) Linux-3.16.0-4-amd64-x86_64-with-debian-8.5 import sys; print("Python", sys.version) ('Python', '2.7.9 (default, Mar 1 2015, 12:57:24) \n[GCC 4.9.2]') import numpy; print("NumPy", numpy.version) ('NumPy', '1.10.4') import scipy; print("SciPy", scipy.version) ('SciPy', '0.14.0') import sklearn; print("Scikit-Learn", sklearn.version) ('Scikit-Learn', '0.18.1')

贡献者指南

技术栈: pythonnumpy
领域: machine learning
议题类型: bug
难度: 3
预计时间: 1-3 hours
活动状态: stale
清晰度: clear
前置要求: MLPRegressor architecturegradient scalingnumpy basics
新手友好度: 40
研究方向: Investigate the loss and gradient computation in MLPRegressor. In particular, examine the compute loss grad function and the squared loss implementation. Ensure that the gradient is scaled by the number of output neurons when multiple outputs are present. Also verify if other training methods (e.g., Adam) have a similar discrepancy. The relevant file is likely sklearn/neural network/ multilayer perceptron.py. Comment on the issue with findings and propose a fix.