Adding support to PPO for continuous actions spaces #103

kuds · 2024-09-10T19:38:20Z

Design

Created a new class called ContinuousProximalPolicyOptimization to handle continuous action spaces for PPO. Updated README with new tutorial, added new unit tests, and general comments clean-up

rodrigodesalvobraz · 2024-09-12T19:06:31Z

Thank you! I am currently reviewing the PR.

rodrigodesalvobraz · 2024-09-13T20:21:25Z

Thanks, I've reviewed it and have two requests:

You duplicated quite a bit of code from ppo.py to continuous_ppo.py. It seems that almost all changes are in _actor_loss. Would you be able to define a base class BasePPO containing most of the common code now in PPO, and subclass it into two subclasses PPO and ContinuousPPO, each of them defining the appropriate _actor_loss?
Can you replace tuple[Tensor, Normal] by Tuple[Tensor, Normal]? The former does not work with Python 3.8, which we still currently support in pyproject.toml.

kuds · 2024-09-14T02:04:18Z

Sure, it makes sense to consolidate the core PPO logic into a base class and have the discrete and continuous versions override where needed (e.g., the _actor_loss method). I will work on incorporating both requests into my pull request and should have those done sometime next week. Thanks for the feedback!

kuds · 2024-09-30T18:14:14Z

Sorry for the delay on this. I have the base class created for PPO with the discrete and continuous versions extending it. I will get my changes checked in by end of this week.

rodrigodesalvobraz · 2024-09-30T19:12:00Z

Sorry for the delay on this. I have the base class created for PPO with the discrete and continuous versions extending it. I will get my changes checked in by end of this week.

No problem, thanks for the update! Looking forward to it.

kuds · 2024-10-08T23:54:31Z

I appreciate your patience on this! I have finished creating the PPO base class with the discrete and continuous versions, inheriting from it and overriding it where needed. Let me know your thoughts and if I can help with any other issues or development priorities!

rodrigodesalvobraz · 2024-10-18T00:06:33Z

It's looking good! I was running the Lunar Lander tutorial (nice!) and encountered an error. I am curious if you are seeing the same thing by any chance? Here's the notebook with the error.

kuds · 2024-10-18T01:14:52Z

So the issue you are running into is known (Issue #1142) with Gymnasium 0.29.1 and Numpy Verison 2+. You have to downgrade Numpy to version 1+. There is a comment in the Lunar Lander tutorial about this to help during this transition.

The Farama Foundation just released version 1.0.0 of Gymnasium about a week ago, which should resolve this issue with Numpy 2+, but I have not tried it with Pearl. The library upgrade for Gymansium and Numpy should probably be its effort/issue/pull request if you are ok with that.

Let me know if you have any other issues or additional questions!

kuds · 2024-10-25T20:10:58Z

@rodrigodesalvobraz

It looks like the Pearl repo has undergone some significant changes in the last couple of days, like around the replay buffer. Would you like me to rework my pull request to handle these latest updates?

rodrigodesalvobraz · 2024-10-25T22:08:12Z

@kuds, that would be greatly appreciated, thank you!
Thank you also for diagnosing that last issue!

…fers and data classes - Rollback PPO changes and sync up with latest round of changes - Updated PPO to use the PPOReplayBuffer, PPOTransition, and PPO TransitionBatch - Update Lunar Lander Tutorial with BasicReplayBuffer and PPOReplayBuffer - Update Tutorials to use new replay buffer

kuds · 2024-10-29T20:33:50Z

I have merged in the latest round of changes from the main pearl branch and assimilated the renamed replay buffers into the PPO code base. I am still running some tests, but I wanted to get this up for an initial review.

…int.ipynb

Updated Unit Test to reference new location of important class and breaking PPO into 2 separate files with a new base class

rodrigodesalvobraz · 2024-11-05T22:53:21Z

I have merged in the latest round of changes from the main pearl branch and assimilated the renamed replay buffers into the PPO code base. I am still running some tests, but I wanted to get this up for an initial review.

Sounds good, thanks. I am checking things around, but I see you are still making changes, so let me know when you're ready.

rodrigodesalvobraz · 2024-11-18T05:20:14Z

Hi @kuds ,
I checked out your PR today and a few unit tests failed:

Errors were:

pearl\policy_learners\sequential_decision_making\ppo.py", line 68, in __init__:
TypeError: ProximalPolicyOptimizationBase.__init__() got an unexpected keyword argument 'history_summarization_learning_rate'

pearl\policy_learners\sequential_decision_making\ppo_continuous.py", line 101, in __init__:
AttributeError: 'ContinuousProximalPolicyOptimization' object has no attribute 'is_action_continuous'

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2024

kuds changed the title ~~Add support to PPO for continuous actions spaces~~ Adding support to PPO for continuous actions spaces Sep 11, 2024

kuds closed this Oct 29, 2024

kuds force-pushed the main branch from a6caca4 to ffdbc8a Compare October 29, 2024 16:21

kuds reopened this Oct 29, 2024

kuds added 4 commits October 29, 2024 15:35

Delete tutorials/lunar_lander/.ipynb_checkpoints/lunar_lander-checkpo…

0fa4243

…int.ipynb

Merge branch 'facebookresearch:main' into main

76f7076

Merge branch 'facebookresearch:main' into main

d67fe15

Fixing import statements for unit tests

4dadc5f

Updated Unit Test to reference new location of important class and breaking PPO into 2 separate files with a new base class

Merge branch 'facebookresearch:main' into main

b016398

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support to PPO for continuous actions spaces #103

Adding support to PPO for continuous actions spaces #103

kuds commented Sep 10, 2024 •

edited

Loading

rodrigodesalvobraz commented Sep 12, 2024

rodrigodesalvobraz commented Sep 13, 2024 •

edited

Loading

kuds commented Sep 14, 2024

kuds commented Sep 30, 2024

rodrigodesalvobraz commented Sep 30, 2024

kuds commented Oct 8, 2024

rodrigodesalvobraz commented Oct 18, 2024 •

edited

Loading

kuds commented Oct 18, 2024

kuds commented Oct 25, 2024

rodrigodesalvobraz commented Oct 25, 2024

kuds commented Oct 29, 2024

rodrigodesalvobraz commented Nov 5, 2024

rodrigodesalvobraz commented Nov 18, 2024

Adding support to PPO for continuous actions spaces #103

Are you sure you want to change the base?

Adding support to PPO for continuous actions spaces #103

Conversation

kuds commented Sep 10, 2024 • edited Loading

Design

rodrigodesalvobraz commented Sep 12, 2024

rodrigodesalvobraz commented Sep 13, 2024 • edited Loading

kuds commented Sep 14, 2024

kuds commented Sep 30, 2024

rodrigodesalvobraz commented Sep 30, 2024

kuds commented Oct 8, 2024

rodrigodesalvobraz commented Oct 18, 2024 • edited Loading

kuds commented Oct 18, 2024

kuds commented Oct 25, 2024

rodrigodesalvobraz commented Oct 25, 2024

kuds commented Oct 29, 2024

rodrigodesalvobraz commented Nov 5, 2024

rodrigodesalvobraz commented Nov 18, 2024

kuds commented Sep 10, 2024 •

edited

Loading

rodrigodesalvobraz commented Sep 13, 2024 •

edited

Loading

rodrigodesalvobraz commented Oct 18, 2024 •

edited

Loading