Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython perf simplification #861

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Cython perf simplification #861

wants to merge 11 commits into from

Conversation

zerothi
Copy link
Owner

@zerothi zerothi commented Nov 6, 2024

  • Closes Pk on complex matrices #104
  • Added tests for new/changed functions?
  • Documentation for functionality in docs/
  • Changes documented in CHANGELOG.md

Here are some details that needs to be sorted out before merging:

  • Adding tests for complex spin-orbit calculations, this includes post-processing like PDOS etc.
    The idea is that instead of 8 numbers, we can now do 4, [uu, dd, ud, du], this has the problem of
    overlap being stored as a complex number, but that might be ok.
  • Document the SOC TB handling a bit better, right now the order of elements are not intuitive.

@zerothi
Copy link
Owner Author

zerothi commented Nov 6, 2024

@pfebrer you won't like this...

Sadly, pure-python mode does not allow cimport statements.

This code-restructuring is a first step to reduce the amount of Cython code.

@@ -88,13 +88,16 @@
# import the common options used
from ._common import *

from ._core import *

Check notice

Code scanning / CodeQL

'import *' may pollute namespace Note

Import pollutes the enclosing namespace, as the imported module
sisl._core
does not define '__all__'.
Copy link

codecov bot commented Nov 6, 2024

Codecov Report

Attention: Patch coverage is 75.43253% with 71 lines in your changes missing coverage. Please review.

Project coverage is 86.90%. Comparing base (0de5fbd) to head (a9a5f0f).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/sisl/physics/sparse.py 40.00% 51 Missing ⚠️
src/sisl/io/siesta/_help.py 78.57% 18 Missing ⚠️
src/sisl/io/siesta/binaries.py 93.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #861      +/-   ##
==========================================
- Coverage   86.94%   86.90%   -0.05%     
==========================================
  Files         403      403              
  Lines       52587    52764     +177     
==========================================
+ Hits        45723    45855     +132     
- Misses       6864     6909      +45     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zerothi zerothi force-pushed the cython-perf-simplification branch 2 times, most recently from 23ab3b7 to 2a5ec18 Compare November 7, 2024 09:30
@pfebrer
Copy link
Contributor

pfebrer commented Nov 7, 2024

Sadly, pure-python mode does not allow cimport statements.

It does, see here:

import cython
import cython.cimports.numpy as cnp
import numpy as np
from cython.cimports.libc.math import sqrt

Do you mean that it doesn't allow to cimport sisl files?

@zerothi
Copy link
Owner Author

zerothi commented Nov 7, 2024

Sadly, pure-python mode does not allow cimport statements.

It does, see here:

import cython
import cython.cimports.numpy as cnp
import numpy as np
from cython.cimports.libc.math import sqrt

Do you mean that it doesn't allow to cimport sisl files?

True, I misunderstood this here...

You're right, but for this to work for sisl intrinsics, you would have to duplicate interfaces in pxd files, and you wouldn't be able to run that code in pure python mode, which kind of defeats the purpose. :S

See here: http://docs.cython.org/en/latest/src/tutorial/pure.html#cimports

@pfebrer
Copy link
Contributor

pfebrer commented Nov 7, 2024

Well, of course it depends on how complex things are, but you could always mock the cimports inside an if cython.compiled: clause so that the file runs in the python interpreter.

But apart from files running out of the box in normal python (which is nice), for me it is a question of readability and the feeling that the code can be ported to something else much more easily if there is a better accelerating framework than cython.

Not saying we should translate everything to pure python, just giving some arguments in favour of the pure python syntax :)

src/sisl/io/siesta/_help.py Fixed Show fixed Hide fixed
@@ -98,45 +99,177 @@
csr.translate_columns(col_from, col_to)


def _mat_spin_convert(M, spin=None):
def _mat2dtype(M, dtype: np.dtype) -> None:

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
src/sisl/io/siesta/tests/test_tsde.py Fixed Show fixed Hide fixed
This was not really intended. But the amount
of different codes that were not using fused tyes
was annoying and prohibited further development.

I initially tried to implement pure-python mode
codes. However, it seems to be impossible to do
pure-python mode and cimport it into another
pure-python mode code.

Before this will be merged, I need to benchmark
the changes made.

I *hope* it is faster, but I am not sure anything
happened.

The main change is that the fused datatypes
are omnipresent and that the ndarray data
type has been removed almost completely.
So now, the memoryviews are prevalent.

Signed-off-by: Nick Papior <[email protected]>
This means that it is much simpler to track problems.
There is no code duplication in the spin-box calculations.

I am not sure whether the python-yellow annotations are
only for the int/long variables, and when down-casting.
In any case, I think this is more easy to manage.

Signed-off-by: Nick Papior <[email protected]>
Simple benchmarks showed that siesta Hamiltonians
to Hk can be much faster by changing how the
folding of the matrices are done.

Instead of incrementally adding elements, and searching
for duplicates before each addition of elements, we know
built the entire array, and use numpy.unique to reduce
the actual array. This leverages the numpy unique
function which already returns a sorted array.

It marginally slows down csr creation of matrices with
few edges per orbital (TB models). But will be
much faster for larger models stemming from DFT
or the likes.

Tests for this commit:

    %timeit H.Hk()
    %timeit H.Hk([0.1] * 3)
    %timeit H.Hk(format="array")
    %timeit H.Hk([0.1] * 3, format="array")

For a *many* edge system, we get:

    67.2 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    85.4 ms ± 8.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    5.59 ms ± 426 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    11.3 ms ± 39.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

While for a *few* edge system, we get:

    9.1 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    9.25 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    5.75 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    6.17 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

For commit v0.15.1-57-g6bbbde39 we get:

    196 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    214 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    6.58 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    12.8 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

and

    7.41 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    7.37 ms ± 73.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    6.04 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    5.81 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Signed-off-by: Nick Papior <[email protected]>
Enabling fortran is necessary for it to populate details of the fortran
world.

Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
This should enable simpler access to data

Signed-off-by: Nick Papior <[email protected]>
Lots of the code were built for floats.
This now fixes the issue of reading/writing
sparse matrices in specific data-formats.

It allows a more natural way of handling SOC
matrices, with complex interplay, say:

 H[0, 0, 2] = Hud

as a complex variable, when dealing with floats
one needs to do this:

 H[0, 0, 2] = Hud.real
 H[0, 0, 3] = Hud.imag

which is not super-intuitive.

Currently there are still *many* hardcodings of the indices.

And we should strive to move these into a common framework
to limit the problems it creates.

Tests has been added that checks Hamiltonian eigenvalues
and Density matrices mulliken charges. So it seems it works
as intended, but not everything has been fully tested.

Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pk on complex matrices
2 participants