lmcinnes/umap

UMAP Roadmap

Open

#15 建立於 2017年11月13日

在 GitHub 查看
 (36 留言) (22 反應) (0 負責人)Python (6,478 star) (751 fork)batch import
help wantednew feature

描述

A rough roadmap of things to be done for UMAP. Some of these tasks are easy, some are hard, and some require deeper knowledge of UMAP. Short and medium term tasks should be approachable for many people. Reply to this issue if you are interested in taking up any of them.

Short term items

  • Support for sparse matrix input
  • Add random seed as an user option
  • Support for cosine distance RP-trees
  • Allow non-RP-tree initialisation of NN-descent
  • Better document (via docstrings) all the support functions
  • "Custom" initialisation with a predefined positioning.

Medium term items

  • Generate notebook for basic usage demonstration
  • Generate notebook explaining parameter options and their effects
  • Set up CI and build a basic test suite
  • Start building basic documentation and integrate with readthedocs

Longer term items

  • Generate notebook for "How UMAP works"
  • Add code (and devise API(?)) for UMAP on general pandas dataframes
  • Add support for semi-supervised dimension reduction via UMAP
  • UMAP as a generative model (code + demo)
  • UMAP for text data (similar to word2vec)
  • A transform function for new previously unseen data (see issue #40)
  • Model persistence for UMAP models

No priority

  • GPU support for UMAP
  • Conda-forge UMAP package
  • Improve numba usage (better numba expertise required)
  • Concurrency via Dask for multicore and distributed support

貢獻者指南

UMAP Roadmap · lmcinnes/umap#15 | Good First Issue