GMM covariance types examples overly complex / confusing · scikit-learn/scikit-learn#10863

(9 comments) (0 reactions) (0 assignees)Python (27,020 forks)batch import

Documentationhelp wantedmodule:mixture

Repository metrics

Stars: (66,084 stars)
PR merge metrics: (Avg merge 10d) (90 merged PRs in 30d)

Description

http://scikit-learn.org/dev/auto_examples/mixture/plot_gmm_covariances.html#sphx-glr-auto-examples-mixture-plot-gmm-covariances-py

I don't particularly like the example because it's supervised and uses a train/test split and identification with the original classes. I think it would be better to use a synthetic dataset and just show off the different covariance types.

It also fits the model on 4d data and only shows a 2d projection and that's not super intuitive imho.

Also, the example could be much simplified if we added a "get_covariance" function back to the model. I think we had that in the old GMM. Was there a reason not to add it to the new GMM? In many cases the user wants to be agnostic to the storage format of the covariance matrix, I think.

Contributor guide

Research direction: Start by exploring the current example at the given URL. Understand how it uses supervised data and train/test split. Then, design a synthetic dataset using functions like `make blobs` to demonstrate the different covariance types. Consider adding a `get covariance` method to the GMM class as suggested. Look at the old GMM implementation for reference. Finally, simplify the example to focus on covariance types without classification.
Tech stack: python
Domain: machine learning
Issue type: Documentation
Difficulty: 2
Estimated time: 1-3 hours
Activity status: Active
Clarity: Clear
Prerequisites: Pythonscikit learnbasics of Gaussian Mixture Models
Newbie friendliness: 65

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.