Unexpected behavior of sklearn.feature_selection.mutual_info_regression if copy=False
#28793 opened on Apr 9, 2024
Description
Describe the bug
The parameter copy of the function mutual_info_regression is described as follows https://github.com/scikit-learn/scikit-learn/blob/d1d1596fac19d688a637690134d71fc460f5f0dd/sklearn/feature_selection/_mutual_info.py#L381-L383
I read it as both X and y should be modified if copy=False and X has continuous features. However, y is always copied. I think the lines
https://github.com/scikit-learn/scikit-learn/blob/d1d1596fac19d688a637690134d71fc460f5f0dd/sklearn/feature_selection/_mutual_info.py#L309-L310 should be
if not discrete_target:
y = y.astype(np.float64, copy=copy)
y = scale(y, with_mean=False, copy=False)
Similarly to the the treatment of X https://github.com/scikit-learn/scikit-learn/blob/d1d1596fac19d688a637690134d71fc460f5f0dd/sklearn/feature_selection/_mutual_info.py#L295-L299
Steps/Code to Reproduce
import numpy as np
from sklearn.feature_selection import mutual_info_regression
n_samples_, n_feats = 30, 2
X = np.random.randn(n_samples_, n_feats)
y = np.random.randn(n_samples_, )
y_copy = y.copy()
X_copy = X.copy()
mutual_info_regression(X, y, copy=False)
print(np.allclose(y, y_copy), np.allclose(X, X_copy))
Expected Results
The result should be
False, False
since both X and y should be modified in place by the function mutual_info_regression.
Actual Results
True, False
Versions
System:
python: 3.11.7 (main, Dec 5 2023, 19:13:35) [GCC 10.2.1 20210110]
executable: /usr/local/bin/python
machine: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Python dependencies:
sklearn: 1.3.0
pip: 23.3.1
setuptools: 69.0.2
numpy: 1.23.2
scipy: 1.12.0
Cython: 3.0.9
pandas: 2.1.4
matplotlib: 3.8.2
joblib: 1.3.2
threadpoolctl: 3.2.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
num_threads: 12
prefix: libopenblas
filepath: /home/user/.local/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-742d56dc.3.20.so
version: 0.3.20
threading_layer: pthreads
architecture: Prescott
user_api: blas
internal_api: openblas
num_threads: 12
prefix: libopenblas
filepath: /home/user/.local/lib/python3.11/site-packages/scipy.libs/libopenblasp-r0-23e5df77.3.21.dev.so
version: 0.3.21.dev
threading_layer: pthreads
architecture: Prescott
user_api: openmp
internal_api: openmp
num_threads: 12
prefix: libgomp
filepath: /home/user/.local/lib/python3.11/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
version: None