WordEmbeddingsKeyedVectors.add() doesn't clear `vectors_norm`, causing `IndexError` on later `most_similar()` · piskvorky/gensim#2532

(17 comments) (0 reactions) (0 assignees)Python (4,349 forks)batch import

Hacktoberfestbugdifficulty easygood first issueimpact MEDIUMreach LOW

Repository metrics

Stars: (15,144 stars)
PR merge metrics: (No merged PRs in 30d)

Description

As reported in a StackOverflow question/answer: https://stackoverflow.com/a/56641265/130288

An adapted version of the asker's minimal test case (which could become a unit test):

import numpy as np
from gensim.models.keyedvectors import WordEmbeddingsKeyedVectors

kv = WordEmbeddingsKeyedVectors(vector_size=3)
kv.add(entities=['a', 'b'],
       weights=[np.random.rand(3), np.random.rand(3)])
kv.most_similar('a')  # works

kv.add(entities=['c'], weights=[np.random.rand(3)])
kv.most_similar('c')  # fails with `IndexError`

Clearing the vectors_norm property (with either del or assignment-to-None) should be sufficient to trigger re-calculation upon the next most_similar().

Contributor guide

Research direction: Investigate the `vectors norm` property in `WordEmbeddingsKeyedVectors` and ensure it is cleared when `add()` is called.
Tech stack: python
Domain: machine learning
Issue type: Bug
Difficulty: 2
Estimated time: 1-3 hours
Activity status: Active
Clarity: Clear
Prerequisites: PythonNumPy
Newbie friendliness: 75

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.