Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal error without crash dump when multiprocessing #355

Open
ankitkv opened this issue Nov 13, 2018 · 9 comments
Open

Fatal error without crash dump when multiprocessing #355

ankitkv opened this issue Nov 13, 2018 · 9 comments
Labels

Comments

@ankitkv
Copy link

ankitkv commented Nov 13, 2018

Script to reproduce and output are as follows.

Here is an example Python3 script (running Python3.7):

#!/usr/bin/python3
import argparse
from multiprocessing import Pool
import random
from time import sleep

import numpy as np
import vizdoom as vzd


parser = argparse.ArgumentParser()
parser.add_argument('--n_maps', type=int, default=64)
parser.add_argument('--episodes', type=int, default=1)
parser.add_argument('--threads', type=int, default=4)
parser.add_argument('--timeout', type=int, default=256)
flags = parser.parse_args()


def generate(map_indices):
    game = vzd.DoomGame()
    game.load_config("defaults.cfg")
    game.add_available_button(vzd.Button.MOVE_FORWARD)
    game.add_available_button(vzd.Button.TURN_LEFT_RIGHT_DELTA, 90)
    game.add_game_args("+movebob 0.0")
    game.add_game_args("+gamma 1.15")
    game.add_game_args("+r_fakecontrast 0")
    game.set_episode_timeout(flags.timeout)

    left = [False, -90]
    right = [False, 90]
    forward = [True, 0]
    all_actions = [left, forward, right]

    game.init()

    for map_idx in map_indices:
        print("Map", map_idx)
        game.set_doom_map('MAP{:02d}'.format(((map_idx - 1) % 32) + 1))

        for ep_idx in range(1, flags.episodes + 1):
            game.new_episode()

            while not game.is_episode_finished():
                game.make_action(random.choice(all_actions))

        print("Map", map_idx,"finished.")


if __name__ == "__main__":
    with Pool(flags.threads) as p:
        step = int(np.ceil(flags.n_maps / flags.threads))
        p.map(generate, [range(i, min(i + step, flags.n_maps + 1)) for i in range(1, flags.n_maps + 1, step)])

The config file defaults.cfg is:

screen_resolution = RES_160X120
screen_format = RGB24
depth_buffer_enabled = true
labels_buffer_enabled = false
automap_buffer_enabled = false
render_hud = false
render_minimal_hud = false
render_crosshair = false
render_weapon = false
render_decals = false
render_particles = false
render_effects_sprites = false
episode_timeout = 1160
episode_start_time = 14
window_visible = false
available_buttons = {}
available_game_variables = {POSITION_X POSITION_Y POSITION_Z VELOCITY_X VELOCITY_Y VELOCITY_Z ANGLE}
game_args += +cl_run 1
game_args += +vid_aspect 3
game_args += +cl_spreaddecals 0
game_args += +hud_scale 1
game_args += +cl_bloodtype 2
game_args += +cl_pufftype 1
game_args += +cl_missiledecals 0
game_args += +cl_bloodsplats 0
game_args += +cl_showmultikills 0
game_args += +cl_showsprees 0
game_args += +con_midtime 0
game_args += +con_notifytime 0
game_args += +am_textured 1
game_args += +am_showtime 0
game_args += +am_showmonsters 0
game_args += +am_showsecrets 0

The output of the script with default arguments ends with:

...
Map 64 finished.
Map 14 finished.
Map 15
Map 47 finished.
Map 48
Map 48 finished.


*** Fatal Error ***
Map 15 finished.
Map 16
Map 16 finished.


*** Fatal Error ***

No crash file is created, and there is no indication as to what is going wrong. This does not happen when multiprocessing is not used. If you're unable to reproduce, try increasing the timeout, changing the number of threads, and increasing the number of maps, in that order.

Any immediate idea what the problem is?

@Miffyli
Copy link
Collaborator

Miffyli commented Nov 13, 2018

First thing that springs to my mind are some mentions about collisions of shared resources when booting up multiple instances. One decent workaround was to add random amount of sleep before creating DoomGame instances. I have not had such issues in the past.

Trying your code on Windows 10 and Ubuntu 16.04 I am not getting same errors with threads from 8 to 36 and with different number of maps. I assume you are using some other scenario file than freedoom2.wad, given that it does not have 64 maps to begin with? Could some of the levels be corrupted?

@ankitkv
Copy link
Author

ankitkv commented Nov 13, 2018

In the posted example script, I'm using freedom2.wad. The map is set to (map_idx - 1) % 32) + 1, so it's always between 1 to 32.

I tried adding a random amount of sleep (up to 5 seconds) before creating DoomGame instances, but that did not help.. still getting the error. I'm using Ubuntu 16.04 with Python 3.7.

@ankitkv
Copy link
Author

ankitkv commented Nov 13, 2018

@Miffyli I find the errors more frequently with threads = 4, even compared to the higher numbers. Could you try that with a larger timeout and maps? Usually the errors show up near the end. You may also need to run the script a few times to get it.

@Miffyli
Copy link
Collaborator

Miffyli commented Nov 13, 2018

Now I got some "fatal errors" at the end with python3 test.py --threads 4 --timeout 10000 --n_maps 128 after running it three times on Ubuntu (Python 3.5, ViZDoom 1.1.6). Interestingly these do not seem to happen on Windows 10 (Python 3.5, ViZDoom 1.1.5).

I have no idea what could cause them, but they seem to happen on destruction of DoomGame objects (when the process ends), judging by how all indexes seem to be covered as expected.

Additionally, semi-offtopic: I take back what I said about 36 threads: It does crash at times with different error. The issue seems to be with importing vizdoom, rather than launching it. I tried creating separate Processes with random delays in between, and now I can happily launch 40 workers (note: indexing may not be correct):

if __name__ == "__main__":
    step = int(np.ceil(flags.n_maps / flags.threads))
    for i in range(flags.threads):
        sleep(random.random()*0.1)
        worker = Process(target=generate,
                         args = (list(range(i*step, min(i*step + step, flags.n_maps + 1))),))
        worker.start()

@ankitkv
Copy link
Author

ankitkv commented Nov 13, 2018

I tried using a different copy per thread of the .ini file, config file, and WAD file, but that did not help (which is in line with your assessment that the problem might be with importing). Also, calling game.close() and game.init() after every n map changes makes the error go away, where n <= 15, which seems quite arbitrary.

@ankitkv
Copy link
Author

ankitkv commented Nov 13, 2018

Instead of using the random delay, the following code waits until one process is done importing vizdoom, creating a DoomGame instance and initializing it before the next process is created:

    step = int(np.ceil(flags.n_maps / flags.threads))
    for i in range(1, flags.n_maps + 1, step):
        parent_conn, child_conn = Pipe()
        worker = Process(target=generate,
                         args=(range(i, min(i + step, flags.n_maps + 1)), child_conn))
        worker.start()
        parent_conn.recv()

(there is a conn.send(0) after game.init())

This still results in the fatal error messages.

The only workaround that has worked all the time so far for me is closing and initing the game after a fixed number (<=15 for freedom2.wad, <=14 for a custom WAD I'm using) of map changes.

mwydmuch added a commit that referenced this issue Nov 15, 2018
@mwydmuch
Copy link
Member

Hello @ankitkv,
the issue is connected with closing Doom instance and message queues, I didn't identify the exact cause but I pushed a hotfix that should prevent this *** Fatal Error ***, allow correct processing of your script and in case of occurrence should correctly dump the crash log. Let me know if this solves the problem for you.

@ankitkv
Copy link
Author

ankitkv commented Nov 15, 2018

It looks like that fixes it!

@ankitkv ankitkv closed this as completed Nov 15, 2018
@ankitkv
Copy link
Author

ankitkv commented Nov 15, 2018

Keeping this issue open until @mwydmuch decides to close it. Especially since the exact cause was still unknown.

@ankitkv ankitkv reopened this Nov 15, 2018
@mihahauke mihahauke added the bug label Feb 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants