random.RandomState with different versions of numpy has vastly different performance
#2,782 opened on 2020年4月3日
説明
the performance of random.RandomState in word2vec.py (version 3.8.0)
def seeded_vector(self, seed_string, vector_size):
once = random.RandomState(self.hashfxn(seed_string) & 0xffffffff)
return (once.rand(vector_size) - 0.5) / vector_size
seemingly depends greatly on the version of numpy installed. With numpy = 1.14.3, the following code
from numpy.random import RandomState as Ran
from time import time
t1 = time()
for i in range(100000):
temp = Ran(hash((i)) & 0xffffffff)
t2 = time()
t2-t1
produced
0.28105926513671875
exactly the same code with numpy= 1.18.1 produced
18.590345859527588
I noticed this because I was training a model with millions of words as vocabulary, and after updating numpy unwittingly (via a anaconda update), I noticed that the time for build_vocab was significantly longer, and after some debugging, I nailed it down to random.RandomState in the seeded_vector function.
I know this is indeed a numpy issue, but even they mentioned it that RandomState is legacy (https://docs.scipy.org/doc/numpy/reference/random/performance.html). Therefore I wonder if you have some plans to upgrade randomstate? Thanks!