Description
This library achieves very high success rates, though it takes a very long time to optimize and train. This could be improved if we could figure out a way to utilize the GPU more during optimization/training, so the CPU can be less of a bottleneck. Currently, the CPU is being used for most of the intermediate environment calculations, while the GPU is used within the PPO2 algorithm during policy optimization.
I am currently optimizing/training on the following hardware:
- AMD Threadripper 1920X 12 Core (24 Thread) CPU
- Nvidia RTX 2080 8GB GPU
- 16 GB 3000 Mhz RAM
The bottleneck on my system is definitely the CPU, which is surprising as this library takes advantage of the multi-threaded benefits of the Threadripper, and my GPU is staying around 1-10% utilization. I have some ideas on how this could be improved, but would like to start a conversation.
-
Increase the size of the policy network (i.e. increase the number of hidden layers or increase the number of nodes in each layer)
-
Do less work in each training loop, so the GPU loop is called more often.
I would love to hear what you guys think. Any ideas or knowledge is welcome to be shared here.