automl/auto-sklearn

Avoid costly re-building of pipelines

Open

#443 opened on Mar 21, 2018

View on GitHub
 (3 comments) (0 reactions) (0 assignees)Python (7,270 stars) (1,265 forks)batch import
Good first issueenhancement

Description

Currently, SMAC suggests hyperparameter configurations which are independent of the dataset size. For example, the hyperparameter classifier:max_features which is specified between zero and one is transformed according to max_features = int(n_features ** classifier:max_features). Assuming the dataset in question has only 10 features, SMAC does not know that most values of the tuned hyperparameter map to the same hyperparameter applied to the actual model. Therefore, one needs to track the 'actual' hyperparameters after transformation and check whether they are re-used, and return a cached function value to SMAC if done so.

Initial experiments suggest that 1-2% of the overall runs are actually re-optimizations.

Contributor guide

Avoid costly re-building of pipelines · automl/auto-sklearn#443 | Good First Issue