-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cython perf simplification #861
base: main
Are you sure you want to change the base?
Commits on Nov 13, 2024
-
complete rewrite of cython files
This was not really intended. But the amount of different codes that were not using fused tyes was annoying and prohibited further development. I initially tried to implement pure-python mode codes. However, it seems to be impossible to do pure-python mode and cimport it into another pure-python mode code. Before this will be merged, I need to benchmark the changes made. I *hope* it is faster, but I am not sure anything happened. The main change is that the fused datatypes are omnipresent and that the ndarray data type has been removed almost completely. So now, the memoryviews are prevalent. Signed-off-by: Nick Papior <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0290be6 - Browse repository at this point
Copy the full SHA 0290be6View commit details -
offloaded NC/SO calculations to simple routines
This means that it is much simpler to track problems. There is no code duplication in the spin-box calculations. I am not sure whether the python-yellow annotations are only for the int/long variables, and when down-casting. In any case, I think this is more easy to manage. Signed-off-by: Nick Papior <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for dea3942 - Browse repository at this point
Copy the full SHA dea3942View commit details -
redid fold_csr_matrix for much better perf
Simple benchmarks showed that siesta Hamiltonians to Hk can be much faster by changing how the folding of the matrices are done. Instead of incrementally adding elements, and searching for duplicates before each addition of elements, we know built the entire array, and use numpy.unique to reduce the actual array. This leverages the numpy unique function which already returns a sorted array. It marginally slows down csr creation of matrices with few edges per orbital (TB models). But will be much faster for larger models stemming from DFT or the likes. Tests for this commit: %timeit H.Hk() %timeit H.Hk([0.1] * 3) %timeit H.Hk(format="array") %timeit H.Hk([0.1] * 3, format="array") For a *many* edge system, we get: 67.2 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 85.4 ms ± 8.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 5.59 ms ± 426 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 11.3 ms ± 39.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) While for a *few* edge system, we get: 9.1 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.25 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.75 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.17 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) For commit v0.15.1-57-g6bbbde39 we get: 196 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 214 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 6.58 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.8 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) and 7.41 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.37 ms ± 73.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.04 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.81 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Signed-off-by: Nick Papior <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9008b21 - Browse repository at this point
Copy the full SHA 9008b21View commit details -
fixed print-out of cmake-variables and ensured Fortran is enabled
Enabling fortran is necessary for it to populate details of the fortran world. Signed-off-by: Nick Papior <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3eb6c17 - Browse repository at this point
Copy the full SHA 3eb6c17View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9fc83f7 - Browse repository at this point
Copy the full SHA 9fc83f7View commit details -
cleaned spin-box extraction of matrix data
This should enable simpler access to data Signed-off-by: Nick Papior <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f6dd6f4 - Browse repository at this point
Copy the full SHA f6dd6f4View commit details -
fixed handling complex matrices in sisl
Lots of the code were built for floats. This now fixes the issue of reading/writing sparse matrices in specific data-formats. It allows a more natural way of handling SOC matrices, with complex interplay, say: H[0, 0, 2] = Hud as a complex variable, when dealing with floats one needs to do this: H[0, 0, 2] = Hud.real H[0, 0, 3] = Hud.imag which is not super-intuitive. Currently there are still *many* hardcodings of the indices. And we should strive to move these into a common framework to limit the problems it creates. Tests has been added that checks Hamiltonian eigenvalues and Density matrices mulliken charges. So it seems it works as intended, but not everything has been fully tested. Signed-off-by: Nick Papior <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ca9ca60 - Browse repository at this point
Copy the full SHA ca9ca60View commit details -
Signed-off-by: Nick Papior <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7ee53e0 - Browse repository at this point
Copy the full SHA 7ee53e0View commit details -
fixed wrong object usage in mulliken extraction
Signed-off-by: Nick Papior <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ea44b2c - Browse repository at this point
Copy the full SHA ea44b2cView commit details -
added explanation of how transform works
Signed-off-by: Nick Papior <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6b65ab8 - Browse repository at this point
Copy the full SHA 6b65ab8View commit details -
fixed nc compilation with complex numbers
Signed-off-by: Nick Papior <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a9a5f0f - Browse repository at this point
Copy the full SHA a9a5f0fView commit details