scikit-learn/scikit-learn

Unexpected behavior of sklearn.feature_selection.mutual_info_regression if copy=False

Open

#28793 opened on Apr 9, 2024

View on GitHub
 (8 comments) (0 reactions) (0 assignees)Python (66,084 stars) (27,020 forks)batch import
Documentationhelp wanted

Description

Describe the bug

The parameter copy of the function mutual_info_regression is described as follows https://github.com/scikit-learn/scikit-learn/blob/d1d1596fac19d688a637690134d71fc460f5f0dd/sklearn/feature_selection/_mutual_info.py#L381-L383

I read it as both X and y should be modified if copy=False and X has continuous features. However, y is always copied. I think the lines https://github.com/scikit-learn/scikit-learn/blob/d1d1596fac19d688a637690134d71fc460f5f0dd/sklearn/feature_selection/_mutual_info.py#L309-L310 should be

    if not discrete_target:
        y = y.astype(np.float64, copy=copy)
        y = scale(y, with_mean=False, copy=False)

Similarly to the the treatment of X https://github.com/scikit-learn/scikit-learn/blob/d1d1596fac19d688a637690134d71fc460f5f0dd/sklearn/feature_selection/_mutual_info.py#L295-L299

Steps/Code to Reproduce

import numpy as np
from sklearn.feature_selection import mutual_info_regression
n_samples_, n_feats = 30, 2
X = np.random.randn(n_samples_, n_feats)
y = np.random.randn(n_samples_, )
y_copy = y.copy()
X_copy = X.copy()
mutual_info_regression(X, y, copy=False)
print(np.allclose(y, y_copy), np.allclose(X, X_copy))

Expected Results

The result should be

False, False

since both X and y should be modified in place by the function mutual_info_regression.

Actual Results

True, False

Versions

System:
    python: 3.11.7 (main, Dec  5 2023, 19:13:35) [GCC 10.2.1 20210110]
executable: /usr/local/bin/python
   machine: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.31

Python dependencies:
      sklearn: 1.3.0
          pip: 23.3.1
   setuptools: 69.0.2
        numpy: 1.23.2
        scipy: 1.12.0
       Cython: 3.0.9
       pandas: 2.1.4
   matplotlib: 3.8.2
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 12
         prefix: libopenblas
       filepath: /home/user/.local/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-742d56dc.3.20.so
        version: 0.3.20
threading_layer: pthreads
   architecture: Prescott

       user_api: blas
   internal_api: openblas
    num_threads: 12
         prefix: libopenblas
       filepath: /home/user/.local/lib/python3.11/site-packages/scipy.libs/libopenblasp-r0-23e5df77.3.21.dev.so
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: Prescott

       user_api: openmp
   internal_api: openmp
    num_threads: 12
         prefix: libgomp
       filepath: /home/user/.local/lib/python3.11/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None

Contributor guide