[ENH] Hidden layer droput uniformization in the deep learning models. · sktime/sktime#9103

(14 comments) (0 reactions) (1 assignee)Python (1,192 forks)batch import

enhancementgood first issuemodule:classificationmodule:regression

Repository metrics

Hidden layer droput uniformization in the deep learning models (classifiers, regressors & networks).

Current behaviour:

In some of the TensorFlow based deep learning models, hidden layer dropout is hardcoded and not exposed to the users. I have pasted one such example below for illustration. https://github.com/sktime/sktime/blob/071551b59669a946a386617fb7706e69349e5dff/sktime/networks/rnn/_rnn_tf.py#L123
In Deep Learning models, currently we can only set the hidden layer activations globally but there is no support for setting it layer-wise. Or, if it is set layer-wise then it is hardcoded and not exposed to the end-user. For example, in MLPNetwork the dropout is different in each layer because the implementation is based on the model described in section 3.2.1 Multi Layer Perceptron on page 14 in Deep learning for time series classification: a review paper from 2019 https://github.com/sktime/sktime/blob/071551b59669a946a386617fb7706e69349e5dff/sktime/networks/mlp.py#L68-L77

Expected/Suggested behaviour:

Wherever dropouts are used in the hidden layers, they should be exposed to the end-user via dropout parameter.
Currently, parameter dropout (wherever present) is expected to be a float, change it to expect either a tuple of floats or a single float. And use the tuple of floats to set layer-wise dropouts, whenever it is specified.
Care needs to be taken in handling the dropouts specified via a tuple, such as enforcing the length of the passed tuple be equal to the number of hidden layers specified, and using the correct dropout in each corresponding hidden layer, etc.
Care needs to be taken where it is allowed and not-allowed to set the dropout layer-wise. There may be certain network implementations either in TensorFlow or PyTorch which do not allow setting layer-wise dropouts in the network, in such cases parameter dropout should strictly be a float only and DOCSTRING should reflect the same.
In current behaviour section above, I have mentioned only 2 examples but all deep learning models should be investigated and fix should be implemented everywhere it is needed.
In order to maintain the consistent behaviour, while setting the default value of dropout parameter
1. use the same value/s which are currently hard-coded,
2. if it is not hard-coded then use the default value from the underlying library's implementation/documentation.

This issue originated from the discussion in the comments of #9042

Research direction: Identify all deep learning models in sktime where dropout is hardcoded or not exposed, then modify the constructor to accept a float or tuple for layer wise dropout, ensuring validation and consistent defaults.
Tech stack: pythontensorflowpytorch
Domain: backendmachine learning
Issue type: Feature
Difficulty: 3
Estimated time: 3-5 days
Activity status: Active
Clarity: Clear
Prerequisites: PythonDeep learning basics
Newbie friendliness: 65