Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Replicating SampledZero Performance in the Hopper-V3 Environment #210

Open
hyLiu1994 opened this issue Apr 9, 2024 · 3 comments
Open
Labels
config New or improved configuration discussion Discussion of a typical issue or concept enhancement New feature or request

Comments

@hyLiu1994
Copy link

I attempted to replicate the sampledefficientzero results displayed in the Hopper-V3 environment's readme benchmark section using the default configuration file (zoo/mujoco/config/mujoco_sampled_efficientzero_config.py). However, I encountered two main issues during the process:

  1. I was unable to achieve the results illustrated by the blue line in the following graph.

image

  1. Additionally, I observed significant discrepancies between the results of two runs using the identical configuration file, as depicted in the graph below. Both the blue and gray lines represent outcomes obtained from the same configuration file.

image

Could you suggest possible reasons for these discrepancies and any solutions to achieve consistent results similar to those presented in the benchmark?

@hyLiu1994 hyLiu1994 changed the title About 复现 sampledzero 于 Hopper-V3 About Replicating SampledZero Performance in the Hopper-V3 Environment Apr 9, 2024
@puyuan1996
Copy link
Collaborator

Hello, thank you for your feedback. Currently, our repository includes an open-source implementation similar to SampledMuZero, which is the only example available since the original authors did not release their source code. Consequently, our implementation may differ from the original in aspects such as network architecture, loss functions, hyperparameters, and training processes. These differences could be one of the reasons for suboptimal performance and instability in training our SampledEfficientZero in continuous action spaces, such as Mujoco. A robust and stable open-source implementation of SampledMuZero would be highly valuable to the community and warrants further investigation. We plan to delve deeper into this matter and will provide updates here. Thank you once again for your valuable input and patience.

@puyuan1996 puyuan1996 added config New or improved configuration discussion Discussion of a typical issue or concept enhancement New feature or request labels Apr 10, 2024
@hyLiu1994
Copy link
Author

Thank you for detail response ~

I will try to optimize for this.

If I have any conclusion, I will share with you.

@puyuan1996
Copy link
Collaborator

Hello, we have successfully implemented SampledMuZero and SampledUniZero in this pull request, and have also optimized the previous SampledEfficientZero. Currently, all three algorithms can reliably achieve near-optimal returns within 200k environment steps in the LunarLander and BipedalWalker environments. We encourage you to test them locally.

In the DMC (DeepMind Control Suite), we have also managed to achieve near-optimal returns within approximately 500k environment steps in the Cartpole-Swingup and Walker-Walk environments (state-input). Performance in other DMC environments is still under active tuning. We will keep you updated with any relevant progress as we continue our work. Thank you for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config New or improved configuration discussion Discussion of a typical issue or concept enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants