Example VPG implementation with ReLAx

This repository contains an implementation of vanilla policy gradient (VPG) with ReLAx.

VPG actor was trained on LunarLander-v2 Gym environment for 4m env-steps.

The graph of average return vs training step is shown below (batch_size=40000):

Resulting Policy:

vpg_run.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
content/video		content/video
tensorboard_logs/vpg_lander		tensorboard_logs/vpg_lander
trained_models		trained_models
README.md		README.md
vpg_example.ipynb		vpg_example.ipynb
vpg_training.png		vpg_training.png

Provide feedback