abacus0214/NHRL

An adaptive algorithm, which should abstract temporally extended actions online, without the need for additional background information (besides a Markovian description of the environment). Several Reinforcement Learning algorithms where embedded in a Hierarchy of policies, among which n-step QL, Expected Sarsa, LSTM neural networks (for Q value learning), Deep Mind's Deep Q-learning architecture, and simultaneous off-policy training (of all abstract actions).

PythonStars 6Forks 0Watchers 6Open issues 5
Details
仓库信息
Ownerabacus0214
Homepage
Last pushed2019-04-02
Last updated2025-12-15
Issues fetched at

Stats

Community at a glance

Loading...

Loading

--

Loading

--

Loading

--

Loading

--