Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython perf simplification #861

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Cython perf simplification #861

wants to merge 11 commits into from

Commits on Nov 13, 2024

  1. complete rewrite of cython files

    This was not really intended. But the amount
    of different codes that were not using fused tyes
    was annoying and prohibited further development.
    
    I initially tried to implement pure-python mode
    codes. However, it seems to be impossible to do
    pure-python mode and cimport it into another
    pure-python mode code.
    
    Before this will be merged, I need to benchmark
    the changes made.
    
    I *hope* it is faster, but I am not sure anything
    happened.
    
    The main change is that the fused datatypes
    are omnipresent and that the ndarray data
    type has been removed almost completely.
    So now, the memoryviews are prevalent.
    
    Signed-off-by: Nick Papior <[email protected]>
    zerothi committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    0290be6 View commit details
    Browse the repository at this point in the history
  2. offloaded NC/SO calculations to simple routines

    This means that it is much simpler to track problems.
    There is no code duplication in the spin-box calculations.
    
    I am not sure whether the python-yellow annotations are
    only for the int/long variables, and when down-casting.
    In any case, I think this is more easy to manage.
    
    Signed-off-by: Nick Papior <[email protected]>
    zerothi committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    dea3942 View commit details
    Browse the repository at this point in the history
  3. redid fold_csr_matrix for much better perf

    Simple benchmarks showed that siesta Hamiltonians
    to Hk can be much faster by changing how the
    folding of the matrices are done.
    
    Instead of incrementally adding elements, and searching
    for duplicates before each addition of elements, we know
    built the entire array, and use numpy.unique to reduce
    the actual array. This leverages the numpy unique
    function which already returns a sorted array.
    
    It marginally slows down csr creation of matrices with
    few edges per orbital (TB models). But will be
    much faster for larger models stemming from DFT
    or the likes.
    
    Tests for this commit:
    
        %timeit H.Hk()
        %timeit H.Hk([0.1] * 3)
        %timeit H.Hk(format="array")
        %timeit H.Hk([0.1] * 3, format="array")
    
    For a *many* edge system, we get:
    
        67.2 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
        85.4 ms ± 8.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
        5.59 ms ± 426 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
        11.3 ms ± 39.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    While for a *few* edge system, we get:
    
        9.1 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
        9.25 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
        5.75 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
        6.17 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    For commit v0.15.1-57-g6bbbde39 we get:
    
        196 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
        214 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
        6.58 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
        12.8 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    and
    
        7.41 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
        7.37 ms ± 73.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
        6.04 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
        5.81 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    Signed-off-by: Nick Papior <[email protected]>
    zerothi committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    9008b21 View commit details
    Browse the repository at this point in the history
  4. fixed print-out of cmake-variables and ensured Fortran is enabled

    Enabling fortran is necessary for it to populate details of the fortran
    world.
    
    Signed-off-by: Nick Papior <[email protected]>
    zerothi committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    3eb6c17 View commit details
    Browse the repository at this point in the history
  5. added the changelog

    Signed-off-by: Nick Papior <[email protected]>
    zerothi committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    9fc83f7 View commit details
    Browse the repository at this point in the history
  6. cleaned spin-box extraction of matrix data

    This should enable simpler access to data
    
    Signed-off-by: Nick Papior <[email protected]>
    zerothi committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    f6dd6f4 View commit details
    Browse the repository at this point in the history
  7. fixed handling complex matrices in sisl

    Lots of the code were built for floats.
    This now fixes the issue of reading/writing
    sparse matrices in specific data-formats.
    
    It allows a more natural way of handling SOC
    matrices, with complex interplay, say:
    
     H[0, 0, 2] = Hud
    
    as a complex variable, when dealing with floats
    one needs to do this:
    
     H[0, 0, 2] = Hud.real
     H[0, 0, 3] = Hud.imag
    
    which is not super-intuitive.
    
    Currently there are still *many* hardcodings of the indices.
    
    And we should strive to move these into a common framework
    to limit the problems it creates.
    
    Tests has been added that checks Hamiltonian eigenvalues
    and Density matrices mulliken charges. So it seems it works
    as intended, but not everything has been fully tested.
    
    Signed-off-by: Nick Papior <[email protected]>
    zerothi committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    ca9ca60 View commit details
    Browse the repository at this point in the history
  8. removed unused variable

    Signed-off-by: Nick Papior <[email protected]>
    zerothi committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    7ee53e0 View commit details
    Browse the repository at this point in the history
  9. fixed wrong object usage in mulliken extraction

    Signed-off-by: Nick Papior <[email protected]>
    zerothi committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    ea44b2c View commit details
    Browse the repository at this point in the history
  10. added explanation of how transform works

    Signed-off-by: Nick Papior <[email protected]>
    zerothi committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    6b65ab8 View commit details
    Browse the repository at this point in the history
  11. fixed nc compilation with complex numbers

    Signed-off-by: Nick Papior <[email protected]>
    zerothi committed Nov 13, 2024
    Configuration menu
    Copy the full SHA
    a9a5f0f View commit details
    Browse the repository at this point in the history