Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiworld Module #3

Open
NasimShafiee opened this issue May 9, 2019 · 11 comments
Open

Multiworld Module #3

NasimShafiee opened this issue May 9, 2019 · 11 comments

Comments

@NasimShafiee
Copy link

Hi,
I am using Anaconda to install your work but when I run "softlearning run_example_local examples.classifier_rl --n_goal_examples 10 --task=Image48SawyerDoorPullHookEnv-v0 --algorithm VICERAQ --num-samples 5 --n_epochs 300 --active_query_frequency 10
", I get error:
File "/home/nasim/reward-learning-rl/examples/classifier_rl/utils.py", line 16, in
from multiworld.envs.mujoco import register_goal_example_envs
ModuleNotFoundError: No module named 'multiworld'

I git clone https://github.com/vitchyr/multiworld.git but it does not work. Which Multiworld module have you used?

@avisingh599
Copy link
Owner

Hmm, did you use the instructions in the README for creating the conda env for this repository? The requirements.txt should have installed the correct version of multiworld for you:

git+https://github.com/avisingh599/multiworld.git@19bf319422c0016260166bf64e194552bf2a9e68

@NasimShafiee
Copy link
Author

I installed it again and I get the following error when I run the example which I believe it came from multiworld module:

Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs
Memory usage on this node: 4.0/8.3 GB
Result logdir: /home/nasim/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-14T19-29-25-2019-05-14T19-29-25
Number of trials: 1 ({'ERROR': 1})
ERROR trials:

  • 51b91bef-algorithm=SAC-seed=2619: ERROR, 1 failures: /home/nasim/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-14T19-29-25-2019-05-14T19-29-25/51b91bef-algorithm=SAC-seed=2619_2019-05-14_19-29-26zic1act4/error_2019-05-14_19-29-32.txt

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs
Memory usage on this node: 4.0/8.3 GB
Result logdir: /home/nasim/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-14T19-29-25-2019-05-14T19-29-25
Number of trials: 1 ({'ERROR': 1})
ERROR trials:

  • 51b91bef-algorithm=SAC-seed=2619: ERROR, 1 failures: /home/nasim/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-14T19-29-25-2019-05-14T19-29-25/51b91bef-algorithm=SAC-seed=2619_2019-05-14_19-29-26zic1act4/error_2019-05-14_19-29-32.txt

@avisingh599
Copy link
Owner

Can you try cat /home/nasim/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-14T19-29-25-2019-05-14T19-29-25/51b91bef-algorithm=SAC-seed=2619_2019-05-14_19-29-26zic1act4/error_2019-05-14_19-29-32.txt and post the output here?

@NasimShafiee
Copy link
Author

(softlearning) nasim@nasim-PC:~/reward-learning-rl$ cat /home/nasim/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-14T19-29-25-2019-05-14T19-29-25/51b91bef-algorithm=SAC-seed=2619_2019-05-14_19-29-26zic1act4/error_2019-05-14_19-29-32.txt

Traceback (most recent call last):
File "/home/nasim/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 443, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/nasim/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 315, in fetch_result
result = ray.get(trial_future[0])
File "/home/nasim/anaconda3/envs/softlearning/lib/python3.6/site-packages/ray/worker.py", line 2193, in get
raise value
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.

@avisingh599
Copy link
Owner

Interesting, I've never seen this error before.

@hartikainen Any idea what might be going on here?

@NasimShafiee In the meanwhile, can you try running softlearning run_example_debug examples.classifier_rl --n_goal_examples 10 --task=Image48SawyerDoorPullHookEnv-v0 --algorithm VICERAQ --n_epochs 300 --active_query_frequency 10?

@hartikainen
Copy link
Contributor

It's hard to say from these logs. @NasimShafiee were there any other logs before/after the ones you already posted here? If so, could you copy-paste the full log here?

@NasimShafiee
Copy link
Author

Thanks for your help!
I believe the problem was my PC.
I switched to another PC now and here is what I've done:

  1. unset LD_PRELOAD
  2. softlearning run_example_debug examples.classifier_rl --n_goal_examples 10 --task=Image48SawyerDoorPullHookEnv-v0 --algorithm VICERAQ --n_epochs 300 --active_query_frequency 10

Full Log:

/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.25.1) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:

WARNING: Logging before flag parsing goes to stderr.
I0515 14:59:12.378803 140717083629312 init.py:34] MuJoCo library version is: 200
Warning: robosuite package not found. Run pip install robosuite to use robosuite environments.
I0515 14:59:12.413537 140717083629312 init.py:333] Registering multiworld mujoco gym environments
I0515 14:59:13.682559 140717083629312 init.py:14] Registering goal example multiworld mujoco gym environments
2019-05-15 14:59:13,752 INFO tune.py:64 -- Did not find checkpoint file in /home/nasimshafiee/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-15T14-59-13-2019-05-15T14-59-13.
2019-05-15 14:59:13,752 INFO tune.py:211 -- Starting a new experiment.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/1 GPUs
Memory usage on this node: 6.3/33.6 GB

Using seed 9941
2019-05-15 14:59:13.764332: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-15 14:59:13.806908: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2904000000 Hz
2019-05-15 14:59:13.808591: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55b74b9c01e0 executing computations on platform Host. Devices:
2019-05-15 14:59:13.808624: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
Found 1 GPUs for rendering. Using device 0.
2019-05-15 14:59:14,959 ERROR ray_trial_executor.py:203 -- Error starting runner for Trial ec70dadd-algorithm=VICERAQ-seed=9941
Traceback (most recent call last):
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 201, in start_trial
self._start_trial(trial, checkpoint)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 143, in _start_trial
self._train(trial)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 111, in _train
remote = trial.runner.train.remote()
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 124, in remote
return self._remote(args, kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 138, in _remote
num_return_vals=num_return_vals)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 479, in _actor_method_call
method_name)(*copy.deepcopy(args))
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train
result = self._train()
File "/home/nasimshafiee/reward-learning-rl/examples/development/main.py", line 77, in _train
self._build()
File "/home/nasimshafiee/reward-learning-rl/examples/classifier_rl/main.py", line 30, in _build
get_goal_example_environment_from_variant(variant))
File "/home/nasimshafiee/reward-learning-rl/softlearning/environments/utils.py", line 48, in get_goal_example_environment_from_variant
return GymAdapter(env=gym.make(variant['task']))
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 183, in make
return registry.make(id, **kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 125, in make
env = spec.make(kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 86, in make
env = self._entry_point(
_kwargs)
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/envs/mujoco/init.py", line 324, in create_image_48_sawyer_door_pull_hook_v0
non_presampled_goal_img_is_garbage=True,
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/core/image_env.py", line 75, in init
sim = self._wrapped_env.initialize_camera(init_camera)
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/envs/mujoco/mujoco_env.py", line 152, in initialize_camera
viewer = mujoco_py.MjRenderContextOffscreen(sim, device_id=self.device_id)
File "mujoco_py/mjrendercontext.pyx", line 43, in mujoco_py.cymj.MjRenderContext.init
File "mujoco_py/mjrendercontext.pyx", line 108, in mujoco_py.cymj.MjRenderContext._setup_opengl_context
File "mujoco_py/opengl_context.pyx", line 128, in mujoco_py.cymj.OffscreenOpenGLContext.init
RuntimeError: Failed to initialize OpenGL
2019-05-15 14:59:16,963 INFO ray_trial_executor.py:179 -- Destroying actor for trial ec70dadd-algorithm=VICERAQ-seed=9941. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2019-05-15 14:59:16,965 INFO ray_trial_executor.py:214 -- Trying to start runner for Trial ec70dadd-algorithm=VICERAQ-seed=9941 without checkpoint.
Using seed 9941
Found 1 GPUs for rendering. Using device 0.
2019-05-15 14:59:17,873 ERROR ray_trial_executor.py:219 -- Error starting runner for Trial ec70dadd-algorithm=VICERAQ-seed=9941, aborting!
Traceback (most recent call last):
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 201, in start_trial
self._start_trial(trial, checkpoint)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 143, in _start_trial
self._train(trial)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 111, in _train
remote = trial.runner.train.remote()
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 124, in remote
return self._remote(args, kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 138, in _remote
num_return_vals=num_return_vals)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 479, in _actor_method_call
method_name)(*copy.deepcopy(args))
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train
result = self._train()
File "/home/nasimshafiee/reward-learning-rl/examples/development/main.py", line 77, in _train
self._build()
File "/home/nasimshafiee/reward-learning-rl/examples/classifier_rl/main.py", line 30, in _build
get_goal_example_environment_from_variant(variant))
File "/home/nasimshafiee/reward-learning-rl/softlearning/environments/utils.py", line 48, in get_goal_example_environment_from_variant
return GymAdapter(env=gym.make(variant['task']))
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 183, in make
return registry.make(id, **kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 125, in make
env = spec.make(kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 86, in make
env = self._entry_point(
_kwargs)
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/envs/mujoco/init.py", line 324, in create_image_48_sawyer_door_pull_hook_v0
non_presampled_goal_img_is_garbage=True,
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/core/image_env.py", line 75, in init
sim = self._wrapped_env.initialize_camera(init_camera)
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/envs/mujoco/mujoco_env.py", line 152, in initialize_camera
viewer = mujoco_py.MjRenderContextOffscreen(sim, device_id=self.device_id)
File "mujoco_py/mjrendercontext.pyx", line 43, in mujoco_py.cymj.MjRenderContext.init
File "mujoco_py/mjrendercontext.pyx", line 108, in mujoco_py.cymj.MjRenderContext._setup_opengl_context
File "mujoco_py/opengl_context.pyx", line 128, in mujoco_py.cymj.OffscreenOpenGLContext.init
RuntimeError: Failed to initialize OpenGL

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 215, in start_trial
self._start_trial(trial)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 143, in _start_trial
self._train(trial)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 111, in _train
remote = trial.runner.train.remote()
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 124, in remote
return self._remote(args, kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 138, in _remote
num_return_vals=num_return_vals)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/actor.py", line 479, in _actor_method_call
method_name)(*copy.deepcopy(args))
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 151, in train
result = self._train()
File "/home/nasimshafiee/reward-learning-rl/examples/development/main.py", line 77, in _train
self._build()
File "/home/nasimshafiee/reward-learning-rl/examples/classifier_rl/main.py", line 30, in _build
get_goal_example_environment_from_variant(variant))
File "/home/nasimshafiee/reward-learning-rl/softlearning/environments/utils.py", line 48, in get_goal_example_environment_from_variant
return GymAdapter(env=gym.make(variant['task']))
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 183, in make
return registry.make(id, **kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 125, in make
env = spec.make(kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/gym/envs/registration.py", line 86, in make
env = self._entry_point(
_kwargs)
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/envs/mujoco/init.py", line 324, in create_image_48_sawyer_door_pull_hook_v0
non_presampled_goal_img_is_garbage=True,
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/core/image_env.py", line 75, in init
sim = self._wrapped_env.initialize_camera(init_camera)
File "/home/nasimshafiee/anaconda3/envs/softlearning/lib/python3.6/site-packages/multiworld/envs/mujoco/mujoco_env.py", line 152, in initialize_camera
viewer = mujoco_py.MjRenderContextOffscreen(sim, device_id=self.device_id)
File "mujoco_py/mjrendercontext.pyx", line 43, in mujoco_py.cymj.MjRenderContext.init
File "mujoco_py/mjrendercontext.pyx", line 108, in mujoco_py.cymj.MjRenderContext._setup_opengl_context
File "mujoco_py/opengl_context.pyx", line 128, in mujoco_py.cymj.OffscreenOpenGLContext.init
RuntimeError: Failed to initialize OpenGL
2019-05-15 14:59:17,874 INFO ray_trial_executor.py:179 -- Destroying actor for trial ec70dadd-algorithm=VICERAQ-seed=9941. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2019-05-15 14:59:17,921 WARNING util.py:62 -- The start_trial operation took 4.16402268409729 seconds to complete, which may be a performance bottleneck.
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 8/8 CPUs, 0/1 GPUs
Memory usage on this node: 6.4/33.6 GB
Result logdir: /home/nasimshafiee/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-15T14-59-13-2019-05-15T14-59-13
Number of trials: 1 ({'ERROR': 1})
ERROR trials:

  • ec70dadd-algorithm=VICERAQ-seed=9941: ERROR, 2 failures: /home/nasimshafiee/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-15T14-59-13-2019-05-15T14-59-13/ec70dadd-algorithm=VICERAQ-seed=9941_2019-05-15_14-59-13euuykcjn/error_2019-05-15_14-59-17.txt

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 8/8 CPUs, 0/1 GPUs
Memory usage on this node: 6.4/33.6 GB
Result logdir: /home/nasimshafiee/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-15T14-59-13-2019-05-15T14-59-13
Number of trials: 1 ({'ERROR': 1})
ERROR trials:

  • ec70dadd-algorithm=VICERAQ-seed=9941: ERROR, 2 failures: /home/nasimshafiee/ray_results/multiworld/mujoco/Image48SawyerDoorPullHookEnv-v0/2019-05-15T14-59-13-2019-05-15T14-59-13/ec70dadd-algorithm=VICERAQ-seed=9941_2019-05-15_14-59-13euuykcjn/error_2019-05-15_14-59-17.txt

Traceback (most recent call last):
File "/home/nasimshafiee/anaconda3/envs/softlearning/bin/softlearning", line 11, in
load_entry_point('softlearning', 'console_scripts', 'softlearning')()
File "/home/nasimshafiee/reward-learning-rl/softlearning/scripts/console_scripts.py", line 202, in main
return cli()
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/nasimshafiee/reward-learning-rl/softlearning/scripts/console_scripts.py", line 81, in run_example_debug_cmd
return run_example_debug(example_module_name, example_argv)
File "/home/nasimshafiee/reward-learning-rl/examples/instrument.py", line 254, in run_example_debug
run_example_local(example_module_name, debug_example_argv, local_mode=True)
File "/home/nasimshafiee/reward-learning-rl/examples/instrument.py", line 228, in run_example_local
reuse_actors=True)
File "/home/nasimshafiee/.local/lib/python3.6/site-packages/ray/tune/tune.py", line 253, in run
raise TuneError("Trials did not complete", errored_trials)
ray.tune.error.TuneError: ('Trials did not complete', [ec70dadd-algorithm=VICERAQ-seed=9941])

@avisingh599
Copy link
Owner

avisingh599 commented May 15, 2019

Looks like this is the OpenGL issue that a lot of people face with mujoco-py. I would suggest trying out some more of the things outlined here: openai/mujoco-py#187

As an alternative, you can use our docker image instead of setting up things locally. The docker image has all the configurations setup correctly, so as long as you have nvidia-docker and docker-compose installed, it will work out of the box.

@NasimShafiee
Copy link
Author

I unset LD_PRELOAD but the error is not solved!

@avisingh599
Copy link
Owner

I would suggest using the docker image supplied with the repository.

@NasimShafiee
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants