-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support to PPO for continuous actions spaces #103
base: main
Are you sure you want to change the base?
Conversation
Thank you! I am currently reviewing the PR. |
Thanks, I've reviewed it and have two requests:
|
Sure, it makes sense to consolidate the core PPO logic into a base class and have the discrete and continuous versions override where needed (e.g., the |
Sorry for the delay on this. I have the base class created for PPO with the discrete and continuous versions extending it. I will get my changes checked in by end of this week. |
No problem, thanks for the update! Looking forward to it. |
I appreciate your patience on this! I have finished creating the PPO base class with the discrete and continuous versions, inheriting from it and overriding it where needed. Let me know your thoughts and if I can help with any other issues or development priorities! |
It's looking good! I was running the Lunar Lander tutorial (nice!) and encountered an error. I am curious if you are seeing the same thing by any chance? Here's the notebook with the error. |
So the issue you are running into is known (Issue #1142) with Gymnasium 0.29.1 and Numpy Verison 2+. You have to downgrade Numpy to version 1+. There is a comment in the Lunar Lander tutorial about this to help during this transition. The Farama Foundation just released version 1.0.0 of Gymnasium about a week ago, which should resolve this issue with Numpy 2+, but I have not tried it with Pearl. The library upgrade for Gymansium and Numpy should probably be its effort/issue/pull request if you are ok with that. Let me know if you have any other issues or additional questions! |
It looks like the Pearl repo has undergone some significant changes in the last couple of days, like around the replay buffer. Would you like me to rework my pull request to handle these latest updates? |
@kuds, that would be greatly appreciated, thank you! |
…fers and data classes - Rollback PPO changes and sync up with latest round of changes - Updated PPO to use the PPOReplayBuffer, PPOTransition, and PPO TransitionBatch - Update Lunar Lander Tutorial with BasicReplayBuffer and PPOReplayBuffer - Update Tutorials to use new replay buffer
I have merged in the latest round of changes from the main pearl branch and assimilated the renamed replay buffers into the PPO code base. I am still running some tests, but I wanted to get this up for an initial review. |
Updated Unit Test to reference new location of important class and breaking PPO into 2 separate files with a new base class
Sounds good, thanks. I am checking things around, but I see you are still making changes, so let me know when you're ready. |
Hi @kuds , Errors were:
|
#77
Design
Created a new class called ContinuousProximalPolicyOptimization to handle continuous action spaces for PPO. Updated README with new tutorial, added new unit tests, and general comments clean-up