online-ml/river

Online GapEncoder

Open

#1,439 建立於 2023年11月3日

在 GitHub 查看
 (1 留言) (0 反應) (0 負責人)Python (4,574 star) (553 fork)batch import
Good first issueNew feature

描述

skrub is a wonderful new project related to scikit-learn. You can see Gaël Varoquaux present it here. They have a transformer called GapEncoder: it's a way to embed fuzzy strings. This could be really powerful online, say for classifying Tweets or Twitch messages, where typos are aplenty.

We already have a way to do online TD-IDF/count vectorization. But we don't have Gamma-Poisson matrix factorization. It is doable online though. Once we have it, we could assemble the two into a nice GapEncoder class. See paper here.

This is related to #1412. Indeed, maybe this works well without Gamma-Poisson matrix factorization. For instance, we could use decomposition.LDA, which we already have.

貢獻者指南