Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threading multiplication with Hamiltonian blocks yields wrong results #1017

Open
jonas-pueschel opened this issue Nov 13, 2024 · 2 comments

Comments

@jonas-pueschel
Copy link

Hi all,

while experimenting with parallelizing Hamiltonian multiplication, I encountered some weird behaviour when trying to multi-thread Hamiltonian block multiplications. I first suspected my multithreading to somehow interfere with the internal multithreading, however this persisted even when using DFTK.disable_threading().

using DFTK
using LinearAlgebra
DFTK.disable_threading()

# setup
function silicon_setup(; Ecut = 15, a = 10.26, kgrid=[2, 2, 2], supercell_size = [1,1,1])
    # Silicon lattice constant in Bohr
    lattice = a / 2 * [[0 1 1.];
                    [1 0 1.];
                    [1 1 0.]]
    Si = ElementPsp(:Si; psp=load_psp("hgh/lda/Si-q4"))
    atoms     = [Si, Si]
    positions = [ones(3)/8, -ones(3)/8]

    if (prod(supercell_size) != 1)
        lattice, atoms, positions = create_supercell(lattice, atoms, positions, supercell_size)
    end

    model = model_LDA(lattice, atoms, positions)
    basis = PlaneWaveBasis(model; Ecut=Ecut, kgrid)
    return [model, basis]
end
model, basis = silicon_setup();
filled_occ = DFTK.filled_occupation(model)
n_spin = model.n_spin_components
n_bands = div(model.n_electrons, n_spin * filled_occ, RoundUp)
Nk = length(basis.kpoints)
occupation = [filled_occ * ones(Float64, n_bands) for ik = 1:Nk]
ψ = [DFTK.random_orbitals(basis, kpt, n_bands) for kpt in basis.kpoints];
ρ = guess_density(basis);
energies, H = energy_hamiltonian(basis, ψ, occupation; ρ)

# no multithreading
result_H_1 = [Array{ComplexF64}(undef, size(ψk)) for ψk = ψ]
for ik = 1:Nk
    result_H_1[ik] .= H.blocks[ik] * ψ[ik]
end

# multithreading
result_H_2 = [Array{ComplexF64}(undef, size(ψk)) for ψk = ψ]
nthreads = Threads.nthreads()
chunk_length = cld(Nk, nthreads)
@sync for (ichunk, chunk) in enumerate(Iterators.partition(1:Nk, chunk_length))
    Threads.@spawn for ik in chunk  # spawn a task per chunk
        result_H_2[ik] .= H.blocks[ik] * ψ[ik]
    end
end

err_H = [norm(result_H_1[ik] - result_H_2[ik]) for ik = 1:Nk]

println("Error in H: $err_H")

When running with one thread, everything works as intended.

julia --threads 1 test.jl
┌ Info: Threading setup: 
│   Threads.nthreads() = 1
│   n_DFTK = 1
│   n_fft = 1
└   n_blas = 1
Error in H: [0.0, 0.0, 0.0]

However, when running with >1 threads, weird things start happening. The result is non-deterministic, as if it was just random memory. For 2 threads, sometimes results with "reasonably small" error ~ 1e-1 can occur. For >2 threads, I have so far encountered various errors, even segfaults.

julia --threads 3 test.jl
┌ Info: Threading setup: 
│   Threads.nthreads() = 3
│   n_DFTK = 1
│   n_fft = 1
└   n_blas = 1
Error in H: [7.3468516299129885e121, 1.0172328098798142e225, 9.21317710616314e225]

Is there something I am missing?

@Technici4n
Copy link
Contributor

I suppose that different threads are trying to use the same scratch from DftHamiltonianBlock. Classic data race. 😄 I suppose that Hamiltonians are not threadsafe.

@antoine-levitt
Copy link
Member

Yes, we use mpi to parallelize kpoints and threads to parallelize bands. I'm not sure how to solve this, other than documenting somewhere that you shouldn't do it. If you really want to do it a hack is to copy the hamiltonian nthreads() times and use a different one for each thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants