scikit-learn/scikit-learn

Missing Data Imputation with a Gaussian Mixture Model using EM

Open

#9,268 创建于 2017年7月3日

在 GitHub 查看
 (17 评论) (1 反应) (0 负责人)Python (66,084 star) (27,020 fork)batch import
help wantedmodule:mixture

描述

Description

I was wondering if there was interest in adding a new imputation strategy (or a new Imputer class) based on a Gaussian Mixture Model (GMM) using the EM or CEM algorithm. The implementation could be along the lines of:

  1. Ghahramani, Zoubin, and Michael I. Jordan. "Supervised learning from incomplete data via an EM approach." Advances in neural information processing systems. 1994.

  2. Ming Ouyang, William J. Welsh, Panos Georgopoulos; Gaussian mixture clustering and imputation of microarray data. Bioinformatics 2004; 20 (6): 917-923. doi: 10.1093/bioinformatics/bth007

  3. Machine Learning: A Probabilistic Approach, Kevin Murphy (2012) pp. 372-375. (although using a mixture rather than a single MVN in that example).

Further, Murphy's implementation of a Gaussian Mixture Model based imputation algorithm is also available here (which I have not tested, by the way).

EDIT: I am working on a completely revised approach, which is now based on reference 1 above. Will post a working version soon.

贡献者指南

Missing Data Imputation with a Gaussian Mixture Model using EM · scikit-learn/scikit-learn#9268 | Good First Issue