Regarding the parameter settings of the `metadrive` environment #291

pixeli99 · 2024-10-29T05:04:16Z

Hi,

Thank you very much to the author for creating such a convenient codebase.

I have a question about metadrive that I would like to ask, as I am new to it. I hope to get some suggestions for hyperparameter settings, such as collector_env_num and bs, etc. I am trying to use metadrive to replay some scenarios from nuScenes, based on which to train the agent. I wonder if the author has any suggestions on parameter settings to speed up convergence?

puyuan1996 · 2024-10-30T07:01:17Z

Hello,

Thank you very much for your support and recognition!

Regarding the hyperparameters for training with MetaDrive, the current configuration metadrive_sampled_efficientzero_config.py can converge to a reasonable return (~250) at around 500K environment steps. However, please note that we have not yet conducted comprehensive tests across a wide range of MetaDrive environments. Therefore, it is recommended that you first conduct preliminary tests using the default configuration and observe the training performance.

During testing, you can make targeted adjustments to the hyperparameters based on the following aspects to accelerate convergence and improve training performance:

Number of environments (collector_env_num) and batch size (batch_size):
- In the current configuration, collector_env_num is set to 8, and batch_size is 64. Depending on your computing resources, you can increase the number of environments and batch size to improve data collection and training efficiency.
Exploration vs. Exploitation:
- Adjusting parameters such as num_simulations, manual_temperature_decay, and policy_entropy_loss_weight can enhance the model’s exploration abilities, thus accelerating convergence.
Replay ratio:
- Based on experimental results and related research (see reference link), appropriately adjusting the replay ratio (e.g., replay_ratio) can help the model better learn from past experiences.
Network Structure:
- You can try different network structures (e.g., increasing the size of hidden layers or introducing layer normalization) to optimize the model’s learning performance.
Learning Rate and Optimizer:
- The current configuration uses the Adam optimizer with a learning rate of 0.003. Based on the training performance, you can try AdamW or other optimizers to find the most suitable settings for your task.

It is recommended that during training, you analyze TensorBoard logs (tb) from the collect, eval, and learn processes to understand the learning curves and bottlenecks of the model. You can then make targeted adjustments to the hyperparameters accordingly. Moreover, feel free to share your experimental results with us so we can discuss further optimizations together.

Wishing you success with your training!

If you have any further questions, feel free to reach out anytime.

pixeli99 · 2024-10-30T07:09:38Z

Thank you very much for your response; it’s been very helpful.

I have another question about multi-GPU. As I understand it, multi-GPU involves data parallelism. If my single GPU’s memory is large enough, the benefits of using multiple GPUs might not be significant? I mainly want to confirm if using multiple GPUs would speed up the data collection process in the environment (as far as I understand, it shouldn’t?).

Thanks again!

puyuan1996 · 2024-10-31T15:12:13Z

Hello! Thank you for your question.

Regarding the use of multiple GPUs, your understanding is generally correct. Multi-GPU setups are mainly used for data parallelism, which involves distributing the same model across different GPUs to process different data batches in parallel. This can speed up model training, especially when dealing with large datasets, or when the memory of a single GPU is insufficient to accommodate the entire dataset or a large model. In such cases, using multiple GPUs can effectively distribute the load.

However, if the memory of a single GPU is already sufficient to handle the entire dataset, using multiple GPUs may not result in significant speedup in certain situations. This is due to the following reasons:

Data transfer overhead: Although multiple GPUs can process data in parallel, synchronizing gradients and updating parameters between the GPUs requires data transfer, which introduces additional communication overhead, especially when the GPUs are connected via PCIe, where bandwidth may be limited.
Model size and computational bottlenecks: If the model is small, or if the bottleneck is not in data processing (e.g., I/O operations or the speed of interaction with the environment), then the speedup from using multiple GPUs may not be significant.

Additionally, regarding the data collection process in the environment, using multiple GPUs generally will not significantly accelerate data collection. Data collection typically relies more on the CPU, I/O devices, and the response speed of the environment itself (e.g., in reinforcement learning scenarios, data collection depends on the step speed of the environment and the agent’s interaction rate). GPUs are mainly used to accelerate computationally intensive tasks, such as forward inference and backpropagation, rather than directly participating in data collection within the environment.

However, in our DDP implementation, we also use <gpu_num> collectors (each collector has collector_env_num subprocesses, and each collector’s model is placed on one GPU) to gather data. In MCTS+RL-based algorithms, a large number of unroll model operations are required during data collection, and these operations take up significantly more time than env.step() operations. Therefore, in such cases, data collection can approach linear speedup.

You can refer to our example to test how distributed data parallelism (DDP) can speed up the training process. In our tests, the speedup ratio is almost linearly proportional to the number of GPUs. You can refer to the following link for specific usage instructions: #223. I hope this answer is helpful to you! If you have further questions, feel free to continue the discussion. (Partially Modified from GPT4o-latest's response ：)

pixeli99 · 2024-11-01T03:07:39Z

Oh, I understand now, thank you very much!

However, in our DDP implementation, we also use <gpu_num> collectors (each collector has collector_env_num subprocesses, and each collector’s model is placed on one GPU) to gather data. In MCTS+RL-based algorithms, a large number of unroll model operations are required during data collection, and these operations take up significantly more time than env.step() operations. Therefore, in such cases, data collection can approach linear speedup.

puyuan1996 added config New or improved configuration discussion Discussion of a typical issue or concept labels Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the parameter settings of the `metadrive` environment #291

Regarding the parameter settings of the `metadrive` environment #291

pixeli99 commented Oct 29, 2024

puyuan1996 commented Oct 30, 2024

pixeli99 commented Oct 30, 2024

puyuan1996 commented Oct 31, 2024 •

edited

Loading

pixeli99 commented Nov 1, 2024

Regarding the parameter settings of the metadrive environment #291

Regarding the parameter settings of the metadrive environment #291

Comments

pixeli99 commented Oct 29, 2024

puyuan1996 commented Oct 30, 2024

pixeli99 commented Oct 30, 2024

puyuan1996 commented Oct 31, 2024 • edited Loading

pixeli99 commented Nov 1, 2024

Regarding the parameter settings of the `metadrive` environment #291

Regarding the parameter settings of the `metadrive` environment #291

puyuan1996 commented Oct 31, 2024 •

edited

Loading