Hardware: Google Colab T4
Model Type | Average Reward | Training Time | Total Training Steps |
---|---|---|---|
PPO | 21.0 | 5:32:21 | 10,000,000 |
DQN | 20.6 | 11:56:00 | 10,000,000 |
- When training with Google Colab Notebooks with high memory option enabled, try not to exceed the buffer size
850,000
as you can run into memory issues - When training in more complex environments or using multiple simulated environments (
n_evn
> 1), DQN is very sensitive to the hyperparameter settings - Stable Baselines3 implementation of Soft Actor-Critic (SAC) only supports continuous action spaces and can not be used with Atari's Pong as it uses discrete actions
- When using rllib, be mindful of your resources, as the training jobs might not start (always in pending status) if there are not enough CPUs or GPUs allocated