facebookresearch/metaseq

Track all of our RNG offsets to avoid collisions

Open

#65 opened on May 8, 2022

View on GitHub
 (0 comments) (0 reactions) (0 assignees)Python (6,195 stars) (701 forks)batch import
enhancementgood first issue

Description

We have RNG seed offsets sprinkled through the codebase:

(base) √ metaseq % ag seed --py | grep +
cpu_tests/test_streaming_token_block_dataset.py:78:        shadow_rng = np.random.default_rng(2273 + seed)
cpu_tests/test_streaming_token_block_dataset.py:124:        shadow_rng = np.random.default_rng(2273 + seed)
metaseq/tasks/language_modeling.py:217:            with data_utils.numpy_seed(self.args.seed + epoch):
metaseq/tasks/streaming_language_modeling.py:316:            seed=1284 + self.args.seed,
metaseq/trainer.py:1052:        seed = self.cfg.common.seed + self.get_num_updates()
metaseq/data/streaming_token_block_dataset.py:96:            rng = np.random.default_rng(2273 + self.seed)
metaseq/data/iterators.py:524:                batches = shuffle_batches(list(batches), self.seed + epoch)
metaseq/data/iterators.py:532:                batches = shuffle_batches(batches, self.seed + epoch + self.shard_id)
metaseq/data/iterators.py:535:                batches = shuffle_batches(list(self.frozen_batches), self.seed + epoch)

Would be good to track these offset to avoid collisions, in cases we're assuming no collision/coupling via offsets.

Contributor guide

Track all of our RNG offsets to avoid collisions · facebookresearch/metaseq#65 | Good First Issue