-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving window crashes in large simulations #405
Comments
It's worth noting that in the scenario above, the code just crashed 100 timesteps after the moving window started, no load balancing or any other periodic thing, except maybe vectorisation changes, was done. There were four species in the simulation, one with 24 ppc in the target region, and three with 12 ppc. |
The reported crash happens during a |
OK, I'd not thought of a problem with hdf5 like that. It'll take some time to do what you're saying so I likely won't update this too soon. I had initially suspected a memory problem, but looking at the actually vmem usage, there is only one node that seems even close to 50% usage so I'd be surpised if that was the problem, however, I did try this reducing the number of particles rather than increasing the number of nodes and this solved the crash issue. However I've not tried the load balancing yet as I think I'll need a cheaper way to test this problem. |
We do not have much time to work on this yet, but it really looks like a memory issue. Note that memory diagnostics may not account for temporary buffers which can be very significant. |
OK, thanks for letting me know. It's difficult to test this, if it's going into a temporary buffer then there isn't much I can do to monitor it. The only thing I can ask at the moment is how are MPI tasks that are due to create a new patch due to a moving window accounted for in the load balance? If they aren't would it be possible to give the patches nearest the relevant edges (i.e. the edge where new patches are made) be given a larger load value? |
Thank you very much for the very detailed feedback. As @mccoys said we have little time to investigate this problem but it is something we have been facing already before and want to understand and improve. This will help us a lot. You notice huge performance improvement once the load balancing resume after the target exits the domain. Isn't that simply because there are much less particles in total in your box ? Or are you positive that it is a balancing effect ? To answer your last question, for the moment load balancing and moving window are completely decorrelated. During the moving window, patches are passed to their left ( -x direction) neighbour and an MPI communication occurs if that makes the patch change MPI domain. A first quick test that I could do is to force load balancing to be done every time the moving window is applied. In my opinion this is not enough and some operations in order to prevent memory usage spikes should be done too. But for that, advanced memory analysis tools should be used. We hope we can do that soon. |
The performance gain is specifically after the code is rebalanced, not at the exact moment the target leaves the box, so yes I'm certain it is a balancing effect. The simulation slows down massively when the moving window is applied and there is no load balancing, I'll plot a graph of some of this stuff at some point to point out what's going on exactly. One of the things I am noticing with this is that it slows down particularly badly when I increase the number of nodes these simulations are performed across (i mean in a relative sense). |
Also I would expect the performance to get better as bits of the target exit the box, but the performance gain from that is relatively modest compared to the rebalanced simulation. If you do force load balancing every time the window is applied could you leave an option in to turn this off? Memory spikes are particularly murderous when you're trying to run this on a limited number of nodes. |
I'm still wondering if the balancing can take into account the moving window itself. If the target is at one end of the box, and the moving edge at the other end of the box the MPI task that has to create the new patch(es) will likely also have a lot of patches on it's end as well, how much memory, time does it take to cycle through a patch with no particles? Though this effect I've still noticed when using the multi-decomposition domain thing, so maybe not. Also I totally understand you've not got a lot of time on your hands. |
Of course ! |
Hi, I'm having a lot of fairly critical crashes when using a moving window in large (requiring >64 knl nodes) simulation runs. When I run a test simulation at half resolution there are no problems but I frequently get crashes in a number of scenarios. A basic outline of the simulation is that a laser hits a thin(ish) target and then the window follows the laser after it penetrates the target.
moving window after say 80 timesteps or so it just crashes, ussually only occurs when there are > 24+12+12 ppc for different species
moving window plus file output, though not reliably
moving window plus load balancing, this reliably causes crashes on large simulations so I've resorted to a particular scheme I'll outline below
The most reliable way it crashes is if the load balancing is run during the time when the bulk target density moves out of the box. So I have basically resorted to a scheme where I make the code only load balance before the moving window starts and after the target has moved out of the simulation box. This does mean that the code performs extremely poorly during the period where the target is exiting the box and then suddenly a huge jump in performance occurs after it has left and the box is rebalanced. Occasionally reducing the number of patches in the window movement direction can avoid a crash, but this is not always reliable and can again result in poor performance.
I will get together as much information for this as possible, but as for the exact namelist file I'd prefer to share that more privately by email as it's for a pending publication.
I have used the multiple decomposition facility, however this crash has happened without this problem in the past. There have been some occasions where the code has simply stalled and not output an error at all.
ELI_film_low_LG_e7743507.txt
ELI_film_low_LG_o7743507.txt
I've plotted the cpu usage, memory usage and networking, nothing seems particularly out of the ordinary. Disk access looks totally normal as well, though the plot for that is not really comprehensible without some context so I've not plotted that here.
The machine file I've made for compiling it on the knl nodes at theTACC Stampede 2 cluster is here:
stampede2_knl.txt
the compile instruction:
module load python3
module load phdf5
module load boost
make clean
export BUILD_DIR=build_knl_intel
make -j machine=stampede2_knl config=no_mpi_tm
The run_script goes something like:
#SBATCH --nodes=64
#SBATCH --ntasks-per-node 32
export OMP_NUM_THREADS=2
export OMP_SCHEDULE=dynamic
export OMP_PROC_BIND=true
module load phdf5
module load python3
module load remora
export TACC_IBRUN_DEBUG=1
remora ibrun mem_afinity ./smilei expanded_target smilei_helper_funcs.py laser_profiles.py angled_target.py
btw remora is the monitoring program i used to create the graphs. The debug option just prints extra information at the begining of the code output describing the MPI environment the code runs in.
The bits of the name list that might be relevant are:
Main(
geometry = "3Dcartesian",
interpolation_order = 4,
number_of_cells = box_shape_cells,
cell_length = cell_lens,
# number_of_timesteps = 2,
simulation_time = sim_time,
timestep_over_CFL = 0.95,
solve_poisson=True,
number_of_patches = [16,64,64],
maxwell_solver = 'Yee',
EM_boundary_conditions = [
["silver-muller", "silver-muller"],
["silver-muller", "silver-muller"],
["silver-muller", "silver-muller"]
],
print_expected_disk_usage = True,
print_every = 10,
random_seed = 0,
)
MultipleDecomposition(
region_ghost_cells = 4
)
LoadBalancing(
initial_balance = True,
every = 1400,
cell_load = 1.,
frozen_particle_load = 1.0
)
Vectorization(
mode = "adaptive",
reconfigure_every = 20,
initial_mode = "on"
)
MovingWindow(
time_start = move_window_time,
velocity_x = 1.0,
number_of_additional_shifts = 0.,
additional_shifts_time = 0.,
)
The text was updated successfully, but these errors were encountered: