Multi-threading multiplication with Hamiltonian blocks yields wrong results #1017

jonas-pueschel · 2024-11-13T14:19:34Z

Hi all,

while experimenting with parallelizing Hamiltonian multiplication, I encountered some weird behaviour when trying to multi-thread Hamiltonian block multiplications. I first suspected my multithreading to somehow interfere with the internal multithreading, however this persisted even when using DFTK.disable_threading().

using DFTK
using LinearAlgebra
DFTK.disable_threading()

# setup
function silicon_setup(; Ecut = 15, a = 10.26, kgrid=[2, 2, 2], supercell_size = [1,1,1])
    # Silicon lattice constant in Bohr
    lattice = a / 2 * [[0 1 1.];
                    [1 0 1.];
                    [1 1 0.]]
    Si = ElementPsp(:Si; psp=load_psp("hgh/lda/Si-q4"))
    atoms     = [Si, Si]
    positions = [ones(3)/8, -ones(3)/8]

    if (prod(supercell_size) != 1)
        lattice, atoms, positions = create_supercell(lattice, atoms, positions, supercell_size)
    end

    model = model_LDA(lattice, atoms, positions)
    basis = PlaneWaveBasis(model; Ecut=Ecut, kgrid)
    return [model, basis]
end
model, basis = silicon_setup();
filled_occ = DFTK.filled_occupation(model)
n_spin = model.n_spin_components
n_bands = div(model.n_electrons, n_spin * filled_occ, RoundUp)
Nk = length(basis.kpoints)
occupation = [filled_occ * ones(Float64, n_bands) for ik = 1:Nk]
ψ = [DFTK.random_orbitals(basis, kpt, n_bands) for kpt in basis.kpoints];
ρ = guess_density(basis);
energies, H = energy_hamiltonian(basis, ψ, occupation; ρ)

# no multithreading
result_H_1 = [Array{ComplexF64}(undef, size(ψk)) for ψk = ψ]
for ik = 1:Nk
    result_H_1[ik] .= H.blocks[ik] * ψ[ik]
end

# multithreading
result_H_2 = [Array{ComplexF64}(undef, size(ψk)) for ψk = ψ]
nthreads = Threads.nthreads()
chunk_length = cld(Nk, nthreads)
@sync for (ichunk, chunk) in enumerate(Iterators.partition(1:Nk, chunk_length))
    Threads.@spawn for ik in chunk  # spawn a task per chunk
        result_H_2[ik] .= H.blocks[ik] * ψ[ik]
    end
end

err_H = [norm(result_H_1[ik] - result_H_2[ik]) for ik = 1:Nk]

println("Error in H: $err_H")

When running with one thread, everything works as intended.

julia --threads 1 test.jl
┌ Info: Threading setup: 
│   Threads.nthreads() = 1
│   n_DFTK = 1
│   n_fft = 1
└   n_blas = 1
Error in H: [0.0, 0.0, 0.0]

However, when running with >1 threads, weird things start happening. The result is non-deterministic, as if it was just random memory. For 2 threads, sometimes results with "reasonably small" error ~ 1e-1 can occur. For >2 threads, I have so far encountered various errors, even segfaults.

julia --threads 3 test.jl
┌ Info: Threading setup: 
│   Threads.nthreads() = 3
│   n_DFTK = 1
│   n_fft = 1
└   n_blas = 1
Error in H: [7.3468516299129885e121, 1.0172328098798142e225, 9.21317710616314e225]

Is there something I am missing?

The text was updated successfully, but these errors were encountered:

Technici4n · 2024-11-13T16:55:52Z

I suppose that different threads are trying to use the same scratch from DftHamiltonianBlock. Classic data race. 😄 I suppose that Hamiltonians are not threadsafe.

antoine-levitt · 2024-11-13T17:43:08Z

Yes, we use mpi to parallelize kpoints and threads to parallelize bands. I'm not sure how to solve this, other than documenting somewhere that you shouldn't do it. If you really want to do it a hack is to copy the hamiltonian nthreads() times and use a different one for each thread

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-threading multiplication with Hamiltonian blocks yields wrong results #1017

Multi-threading multiplication with Hamiltonian blocks yields wrong results #1017

jonas-pueschel commented Nov 13, 2024

Technici4n commented Nov 13, 2024

antoine-levitt commented Nov 13, 2024

Multi-threading multiplication with Hamiltonian blocks yields wrong results #1017

Multi-threading multiplication with Hamiltonian blocks yields wrong results #1017

Comments

jonas-pueschel commented Nov 13, 2024

Technici4n commented Nov 13, 2024

antoine-levitt commented Nov 13, 2024