-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to solve the "ModuleNotFoundError: No module named 'experiments'"? #19
Comments
I have solved the problem. But I encountered another :( python -m lamorel_launcher.launch --config-path Absolute/Path/To/Grounding_LLMs_with_online_RL/experiments/configs --config-name local_gpu_config rl_script_args.path=Absolute/Path/To/Grounding_LLMs_with_online_RL/experiments/train_language_agent.py lamorel_args.accelerate_args.machine_rank=0 As you mentioned in flowersteam/lamorel#23 (comment) lamorel_args:
log_level: info
allow_subgraph_use_whith_gradient: true
distributed_setup_args:
n_rl_processes: 1
n_llm_processes: 1
accelerate_args:
config_file: accelerate/default_config.yaml
machine_rank: 0
num_machines: 1
llm_args:
model_type: seq2seq
model_path: t5-small
pretrained: true
minibatch_size: 4
pre_encode_inputs: true
parallelism:
use_gpu: true
model_parallelism_size: 1
synchronize_gpus_after_scoring: false
empty_cuda_cache_after_scoring: false
rl_script_args:
path: ???
seed: 1
number_envs: 2
num_steps: 1000
max_episode_steps: 3
frames_per_proc: 40
reward_shaping_beta: 0
discount: 0.99
lr: 1e-6
beta1: 0.9
beta2: 0.999
gae_lambda: 0.99
entropy_coef: 0.01
value_loss_coef: 0.5
max_grad_norm: 0.5
adam_eps: 1e-5
clip_eps: 0.2
epochs: 4
batch_size: 16
action_space: ["turn_left","turn_right","go_forward","pick_up","drop","toggle"]
saving_path_logs: Desktop/workspace2/Grounding_LLMs_with_online_RL/logs
name_experiment: 'llm_mtrl'
name_model: 'T5small'
saving_path_model: Desktop/workspace2/Grounding_LLMs_with_online_RL/model
name_environment: 'BabyAI-KeyCorridorS3R3-v0'
number_episodes: 10
language: 'english'
load_embedding: true
use_action_heads: false
template_test: 1
zero_shot: true
modified_action_space: false
new_action_space: #["rotate_left","rotate_right","move_ahead","take","release","switch"]
spm_path: "YOUR_PATH_TO_PROJECT/experiments/agents/drrn/spm_models/unigram_8k.model"
random_agent: true
get_example_trajectories: false
nbr_obs: 3
im_learning: false
im_path: ""
bot: false It returns that: 2023-11-11 22:26:56,396][lamorel_logger][INFO] - Init rl-llm group for process 1
[2023-11-11 22:26:56,396][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes.
[2023-11-11 22:26:56,396][lamorel_logger][INFO] - Init rl-llm group for process 0
[2023-11-11 22:26:56,407][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:4 to store for rank: 1
[2023-11-11 22:26:56,407][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:4 to store for rank: 0
[2023-11-11 22:26:56,407][torch.distributed.distributed_c10d][INFO] - Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes.
[2023-11-11 22:26:56,407][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes.
[2023-11-11 22:26:56,408][lamorel_logger][INFO] - 6 gpus available for current LLM but using only model_parallelism_size = 1
[2023-11-11 22:26:56,409][lamorel_logger][INFO] - Devices on process 1 (index 0): [0]
Parallelising HF LLM on 1 devices
Loading model t5-small
Error executing job with overrides: ['rl_script_args.path=~/Desktop/workspace2/Grounding_LLMs_with_online_RL/experiments/train_language_agent.py', 'lamorel_args.accelerate_args.machine_rank=0']
Traceback (most recent call last):
File "~/Desktop/workspace2/Grounding_LLMs_with_online_RL/experiments/train_language_agent.py", line 393, in main
lm_server = Caller(config_args.lamorel_args, custom_updater=PPOUpdater(),
File "~/Desktop/workspace2/Grounding_LLMs_with_online_RL/lamorel/lamorel/src/lamorel/caller.py", line 53, in __init__
Server(
File "~/Desktop/workspace2/Grounding_LLMs_with_online_RL/lamorel/lamorel/src/lamorel/server/server.py", line 40, in __init__
self._model = HF_LLM(config.llm_args, devices, use_cpu)
File "~/Desktop/workspace2/Grounding_LLMs_with_online_RL/lamorel/lamorel/src/lamorel/server/llms/hf_llm.py", line 38, in __init__
device_map = infer_auto_device_map(
File "~/miniconda3/envs/dlp/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 923, in infer_auto_device_map
max_memory = get_max_memory(max_memory)
File "~/miniconda3/envs/dlp/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 674, in get_max_memory
raise ValueError(
ValueError: Device 0 is not recognized, available devices are integers(for GPU/XPU), 'mps', 'cpu' and 'disk' |
Hi, What is your version of Accelerate? The passed device isn't recognized which is weird. |
Please see flowersteam/lamorel#24 as it seems to be due to pythorch's version. |
Hello, where is the directory of the file "lamorel_launcher.launch", I checked the folder "lamorel_launcher" in lamorel, yet I can't find it, thanks! |
I'm using 6GPUs on a single machine. This is my command:
and
It returns:
ModuleNotFoundError: No module named 'experiments'
The text was updated successfully, but these errors were encountered: