good first issue
Description
In /reinforcement_learning/reinforce.py, line 91:
running_reward = running_reward * 0.99 + t * 0.01
The variable running_reward seems to used for record average episodic rewards(not actually average, but I think the concept is similar), 0.01, is the scalar to update the average episodic rewards and t is done step. Add some comment or refactor naming may help beginners to understand this example.