-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU-friendly G-vector optimizations and de-duplication of form factor calculation #825
Comments
I remember also there was the question of interpolating the form factors. Presumably it can be made fast enough that we don't need this type of optimization, which would maybe be simpler? |
Interpolation would be another good option, yes. Each pseudo quantity needs to be FTed once per SCF (when initializing model terms: beta-proj. for nonlocal and core density for NLCC in the exchange-correlation and when creating a guess density or guess wavefunction via superposition), so regardless of the approach, the goal is to reduce the number of evaluations of the The minimum for exact evaluation would be one call per unique value of If we can automatically find a set of points to use in building the interpolators which is both smaller than the above and which gives an accurate interpolation, then that would be the way to go. Building the interpolator with the unique Do you have ideas for how to select a reduced set of points while still ensuring the accuracy of the interpolator? I think that how we do this would determine which method is simpler. |
Just a plain old linear grid with spline interpolation? Then just plug in Interpolations.jl which should be fast. The projector is very smooth in Fourier space (it is compactly supported in real space) so that should give good results. To be clear, I'm suggesting not computing the exact result here but having some error, controlled by the qspace grid sampling. I believe that's what abinit does |
Ah I see, sounds very reasonable. I still feel that giving users the choice to use an exact method that's not too slow would be good. If the interpolation is easier to implement first though, we can always add this in later. |
EDIT: see here for the code Here's some sample code for how these functions could look. On CPU, I'm seeing a 2-3x speedup for both transforms. First, the structure factors are precomputed when the Structure Factors
function build_structure_factors(Gs::AbstractArray{Vec3{T}},
positions::AbstractVector{Vec3{S}}) where {T<:Real,S<:Real}
map(positions) do position
map(Gs) do G
cis2pi(-dot(G, position))
end
end
end The appropriate 1~3 lines are added in the Then, two functions need to be written each for density quantities and orbital/projector quantities: Density
"""
Build an on-grid cubic spline interpolator for `eval_psp_{some-density}_fourier` using `n_qnorm_interpolate` points between 0 and `max|Gcart|`.
"""
function build_interpolators_valence_charge(basis::PlaneWaveBasis{T},
n_qnorm_interpolate::Int=3001) where {T}
qnorm_max = maximum(norm.(G_vectors_cart(basis)))
qnorm_interpolate = range(0, qnorm_max, n_qnorm_interpolate)
map(basis.model.atom_groups) do atom_group
psp = basis.model.atoms[first(atom_group)].psp
f̃ = eval_psp_density_valence_fourier.(psp, qnorm_interpolate)
scale(interpolate(f̃, BSpline(Cubic(Line(OnGrid())))), qnorm_interpolate)
end
end
function superposition_valence_charge(basis::PlaneWaveBasis{T}) where {T}
itps = build_interpolators_valence_charge(basis)
ρ = map(enumerate(G_vectors(basis))) do (iG, G)
qnorm_cart = norm(recip_vector_red_to_cart(basis.model, G))
ρ_G = sum(enumerate(basis.model.atom_groups); init=zero(Complex{T})) do (igroup, atom_group)
form_factor = itps[igroup](norm(qnorm_cart))
sum(atom_group) do iatom
structure_factor = basis.structure_factors[iatom][iG]
form_factor * structure_factor
end
end
ρ_G / sqrt(basis.model.unit_cell_volume)
end
enforce_real!(basis, ρ)
irfft(basis, ρ)
end Projector / Orbital
"""
Build on-grid cubic spline interpolators for each projector/orbital quantity using `n_qnorm_interpolate` points between 0 and `max|G+k|` over all k-points.
"""
function build_interpolators_projectors(basis::PlaneWaveBasis{T},
atom_groups::Vector{Vector{Int}},
n_qnorm_interpolate::Int=3001) where {T}
qnorm_max = maximum(maximum(norm.(Gplusk_vectors_cart(basis, kpt))) for kpt in basis.kpoints)
qnorm_interpolate = range(0, qnorm_max, n_qnorm_interpolate)
map(atom_groups) do atom_group
psp = basis.model.atoms[first(atom_group)].psp
map(0:psp.lmax) do l
map(1:count_n_proj_radial(psp, l)) do iproj_l
f̃ = eval_psp_projector_fourier.(psp, iproj_l, l, qnorm_interpolate)
scale(interpolate(f̃, BSpline(Cubic(Line(OnGrid())))), qnorm_interpolate)
end
end
end
end
function build_projection_vectors(basis::PlaneWaveBasis{T}, atom_groups::Vector{Vector{Int}}) where {T}
itps = build_interpolators_projectors(basis, atom_groups, 6002) # itps[group][l][n]
psps = [basis.model.atoms[first(atom_group)].psp for atom_group in atom_groups]
psp_positions = [basis.model.positions[atom_group] for atom_group in atom_groups]
nproj = count_n_proj(psps, psp_positions)
ngroup = length(atom_groups)
lmax = maximum(psp.lmax for psp in psps)
sqrt_Vuc = sqrt(basis.model.unit_cell_volume)
proj_vectors = map(basis.kpoints) do kpt
qs_cart = Gplusk_vectors_cart(basis, kpt)
qnorms_cart = norm.(qs_cart)
proj_vectors_k = zeros(Complex{T}, size(qs_cart, 1), nproj)
angular = map(0:lmax) do l # angular[l][m][q]
map(-l:l) do m
map(qs_cart) do q_cart
im^l * ylm_real(l, m, q_cart)
end
end
end
radial = map(1:ngroup) do igroup # radial[group][l][n][q]
psp = psps[igroup]
map(0:psp.lmax) do l
map(1:count_n_proj_radial(psp, l)) do n
map(itps[igroup][l+1][n], qnorms_cart)
end
end
end
iproj = 1
for igroup in 1:ngroup
psp = psps[igroup]
for iatom in atom_groups[igroup]
for l in 0:psp.lmax
il = l + 1
for im in 1:(2l + 1)
for n in 1:count_n_proj_radial(psp, l)
aff = angular[il][im] # Form-factor angular part
rff = radial[igroup][il][n] # Form-factor radial part
sf = kpt.structure_factors[iatom] # Structure factor
proj_vectors_k[:,iproj] .= sf .* aff .* rff ./ sqrt_Vuc
iproj += 1
end
end
end
end
end
proj_vectors_k
end
return proj_vectors
end Here are some benchmarks comparing the current density superposition and the proposed density superposition (using 3001 q-points for interpolation). The system is: a = 10.26; # Silicon lattice constant in Bohr
lattice = a / 2 * [[0 1 1.];
[1 0 1.];
[1 1 0.]];
Si = ElementPsp(:Si, psp=load_psp(artifact"pd_nc_sr_lda_standard_0.4.1_upf", "Si.upf"));
atoms = [Si, Si];
positions = [ones(3)/8, -ones(3)/8];
model = model_LDA(lattice, atoms, positions);
basis = PlaneWaveBasis(model; Ecut=90, kgrid=[4, 4, 4]); For the density:
The extrema of the absolute error on the computed quantity are And for the projectors/orbitals:
The extrema of the absolute error on the computed quantity are (per k-point):
|
The error is very encouraging, although I would have expected a larger speedup. Note though that the larger the system, the better the speedup. I'm assuming the bottleneck is the precomputation of the function on the interpolation points? If so, we might be able to gain by using higher-degree interpolation. But that seems like a good compromise between accuracy, simplicity and speed. |
I think what's done now is already quite efficient, and as you said, the speedup improves with more atoms and elements, but also with the number of projectors. Most of the time is spent in precomputation for the interpolation. @louisponet gave me some pointers on speeding up the Bessel transforms, which is actually where the most time is spent, so I've been working on that in PseudoPotentialIO. I think the biggest performance gains w.r.t. the Bessel transform would be to work on removing the need for checks on angular momentum in I'm not sure how much performance we can squeeze out of higher-degree interpolation, but I'd guess that it's probably max 2x, what do you think? |
I don't know, since the function is so smooth 3000 points is probably overkill and we can get by with much less. Also https://en.wikipedia.org/wiki/Hankel_transform#Numerical_evaluation looks applicable and probably the best way to do this, but if cubic splines are already good enough it's probably not worth the bother... |
I could implement numerical evaluation in PseudoPotentialIO for pseudos on a log grid; it looks pretty neat. This will mainly work for UPF v1 pseudos / HGH pseudos in UPF format, notably not PseudoDojo. The two other methods mentioned on Wikipedia for uniform grids: asymptotic expansion of the Bessel functions and the projection-slice theorem don't seem to fit very well in our case at a glance. One other trick that I've seen is that ABINIT uses a recursive algorithm to pre-compute the Bessel functions of order The last optimization that I see helping out is truncating the pseudo quantities on the radial grid where they decay to 0. This is recommended by Abinit for numerical stability, but in my testing it's also another good way to cut out Bessel calls and significantly increase performance. At the end of the day, performance improvements in this area are aimed at setup time and mostly effect cases where |
Yeah, if it's fast enough there's not much point...
One could presumably interpolate from a linear to a log grid. |
I remember from one of the weekly meetings that the
IdDict
s that are currently used for G-vector optimization in form factor calculations don't play well with GPU calculations.Repeating the unique-G-norm finding and form-factor calculation all over the place also isn't the nicest.
I'm wondering if we could take a page out of QE's book and initialize a
G_shells
vector (i.e. the unique values of|G|
or|G+k|
) and a G-vector to G-shell mapping inPlaneWaveBasis
and in eachKpoint
.With some intelligent sorting, this could likely be as performant as a hash-map on CPU, and it would also work on GPU.
Along with this, I think we've discussed trying to centralize form-factor and structure-factor calculations somewhere to reduce duplication. If we move to G-shells, these functions would have to be re-worked anyway, so it would be a good opportunity to bring everything together.
For form-factors, I'd suggest two methods: one for density-like objects that don't need spherical harmonics (i.e.
l===0
,m===0
), and one for orbital/projector-like objects that do (arbitraryl
,m
). I use a marker struct inPseudoPotentialIO
for the same purpose with the Bessel transform.Structure factors really only need to be calculated once for a given atomic geometry, so they could even be stored.
The text was updated successfully, but these errors were encountered: