Cython perf simplification #861

zerothi · 2024-11-06T19:54:26Z

Closes Pk on complex matrices #104
Added tests for new/changed functions?
Documentation for functionality in docs/
Changes documented in CHANGELOG.md

Here are some details that needs to be sorted out before merging:

Adding tests for complex spin-orbit calculations, this includes post-processing like PDOS etc.
The idea is that instead of 8 numbers, we can now do 4, [uu, dd, ud, du], this has the problem of
overlap being stored as a complex number, but that might be ok.
Document the SOC TB handling a bit better, right now the order of elements are not intuitive.

zerothi · 2024-11-06T19:55:11Z

@pfebrer you won't like this...

Sadly, pure-python mode does not allow cimport statements.

This code-restructuring is a first step to reduce the amount of Cython code.

src/sisl/__init__.py

@@ -88,13 +88,16 @@
 # import the common options used
 from ._common import *

+from ._core import *


codecov · 2024-11-06T20:06:34Z

Codecov Report

Attention: Patch coverage is 75.43253% with 71 lines in your changes missing coverage. Please review.

Project coverage is 86.90%. Comparing base (0de5fbd) to head (a9a5f0f).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/sisl/physics/sparse.py	40.00%	51 Missing ⚠️
src/sisl/io/siesta/_help.py	78.57%	18 Missing ⚠️
src/sisl/io/siesta/binaries.py	93.33%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #861      +/-   ##
==========================================
- Coverage   86.94%   86.90%   -0.05%     
==========================================
  Files         403      403              
  Lines       52587    52764     +177     
==========================================
+ Hits        45723    45855     +132     
- Misses       6864     6909      +45

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pfebrer · 2024-11-07T18:41:44Z

Sadly, pure-python mode does not allow cimport statements.

It does, see here:

sisl/src/sisl/geom/_neighbors/_operations.py

Lines 8 to 11 in 6bd1395

    
           import cython 
        
           import cython.cimports.numpy as cnp 
        
           import numpy as np 
        
           from cython.cimports.libc.math import sqrt

Do you mean that it doesn't allow to cimport sisl files?

zerothi · 2024-11-07T20:46:57Z

Sadly, pure-python mode does not allow cimport statements.

It does, see here:

sisl/src/sisl/geom/_neighbors/_operations.py

Lines 8 to 11 in 6bd1395

import cython

import cython.cimports.numpy as cnp

import numpy as np

from cython.cimports.libc.math import sqrt

Do you mean that it doesn't allow to cimport sisl files?

True, I misunderstood this here...

You're right, but for this to work for sisl intrinsics, you would have to duplicate interfaces in pxd files, and you wouldn't be able to run that code in pure python mode, which kind of defeats the purpose. :S

See here: http://docs.cython.org/en/latest/src/tutorial/pure.html#cimports

pfebrer · 2024-11-07T20:57:28Z

Well, of course it depends on how complex things are, but you could always mock the cimports inside an if cython.compiled: clause so that the file runs in the python interpreter.

But apart from files running out of the box in normal python (which is nice), for me it is a question of readability and the feeling that the code can be ported to something else much more easily if there is a better accelerating framework than cython.

Not saying we should translate everything to pure python, just giving some arguments in favour of the pure python syntax :)

src/sisl/io/siesta/_help.py

@@ -98,45 +99,177 @@
    csr.translate_columns(col_from, col_to)


-def _mat_spin_convert(M, spin=None):
+def _mat2dtype(M, dtype: np.dtype) -> None:


src/sisl/io/siesta/tests/test_tsde.py

This was not really intended. But the amount of different codes that were not using fused tyes was annoying and prohibited further development. I initially tried to implement pure-python mode codes. However, it seems to be impossible to do pure-python mode and cimport it into another pure-python mode code. Before this will be merged, I need to benchmark the changes made. I *hope* it is faster, but I am not sure anything happened. The main change is that the fused datatypes are omnipresent and that the ndarray data type has been removed almost completely. So now, the memoryviews are prevalent. Signed-off-by: Nick Papior <[email protected]>

This means that it is much simpler to track problems. There is no code duplication in the spin-box calculations. I am not sure whether the python-yellow annotations are only for the int/long variables, and when down-casting. In any case, I think this is more easy to manage. Signed-off-by: Nick Papior <[email protected]>

Simple benchmarks showed that siesta Hamiltonians to Hk can be much faster by changing how the folding of the matrices are done. Instead of incrementally adding elements, and searching for duplicates before each addition of elements, we know built the entire array, and use numpy.unique to reduce the actual array. This leverages the numpy unique function which already returns a sorted array. It marginally slows down csr creation of matrices with few edges per orbital (TB models). But will be much faster for larger models stemming from DFT or the likes. Tests for this commit: %timeit H.Hk() %timeit H.Hk([0.1] * 3) %timeit H.Hk(format="array") %timeit H.Hk([0.1] * 3, format="array") For a *many* edge system, we get: 67.2 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 85.4 ms ± 8.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 5.59 ms ± 426 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 11.3 ms ± 39.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) While for a *few* edge system, we get: 9.1 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.25 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.75 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.17 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) For commit v0.15.1-57-g6bbbde39 we get: 196 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 214 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 6.58 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.8 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) and 7.41 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.37 ms ± 73.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.04 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.81 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Signed-off-by: Nick Papior <[email protected]>

Enabling fortran is necessary for it to populate details of the fortran world. Signed-off-by: Nick Papior <[email protected]>

Signed-off-by: Nick Papior <[email protected]>

This should enable simpler access to data Signed-off-by: Nick Papior <[email protected]>

Lots of the code were built for floats. This now fixes the issue of reading/writing sparse matrices in specific data-formats. It allows a more natural way of handling SOC matrices, with complex interplay, say: H[0, 0, 2] = Hud as a complex variable, when dealing with floats one needs to do this: H[0, 0, 2] = Hud.real H[0, 0, 3] = Hud.imag which is not super-intuitive. Currently there are still *many* hardcodings of the indices. And we should strive to move these into a common framework to limit the problems it creates. Tests has been added that checks Hamiltonian eigenvalues and Density matrices mulliken charges. So it seems it works as intended, but not everything has been fully tested. Signed-off-by: Nick Papior <[email protected]>

Signed-off-by: Nick Papior <[email protected]>

github-advanced-security bot found potential problems Nov 6, 2024

View reviewed changes

zerothi force-pushed the cython-perf-simplification branch 2 times, most recently from 23ab3b7 to 2a5ec18 Compare November 7, 2024 09:30

zerothi mentioned this pull request Nov 8, 2024

Remove dtype in the Spin object #864

Open

zerothi force-pushed the cython-perf-simplification branch from 5014eac to 295c413 Compare November 8, 2024 12:38

github-advanced-security bot found potential problems Nov 8, 2024

View reviewed changes

zerothi added 11 commits November 13, 2024 13:32

fixed print-out of cmake-variables and ensured Fortran is enabled

3eb6c17

Enabling fortran is necessary for it to populate details of the fortran world. Signed-off-by: Nick Papior <[email protected]>

added the changelog

9fc83f7

Signed-off-by: Nick Papior <[email protected]>

cleaned spin-box extraction of matrix data

f6dd6f4

This should enable simpler access to data Signed-off-by: Nick Papior <[email protected]>

removed unused variable

7ee53e0

Signed-off-by: Nick Papior <[email protected]>

fixed wrong object usage in mulliken extraction

ea44b2c

Signed-off-by: Nick Papior <[email protected]>

added explanation of how transform works

6b65ab8

Signed-off-by: Nick Papior <[email protected]>

fixed nc compilation with complex numbers

a9a5f0f

Signed-off-by: Nick Papior <[email protected]>

zerothi force-pushed the cython-perf-simplification branch from f16ea89 to a9a5f0f Compare November 13, 2024 12:33

zerothi mentioned this pull request Nov 13, 2024

Clarification on Recent Changes to sisl.physics.Hamiltonian #869

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cython perf simplification #861

Cython perf simplification #861

zerothi commented Nov 6, 2024 •

edited

Loading

zerothi commented Nov 6, 2024

codecov bot commented Nov 6, 2024 •

edited

Loading

pfebrer commented Nov 7, 2024

zerothi commented Nov 7, 2024

pfebrer commented Nov 7, 2024 •

edited

Loading

Cython perf simplification #861

Are you sure you want to change the base?

Cython perf simplification #861

Conversation

zerothi commented Nov 6, 2024 • edited Loading

zerothi commented Nov 6, 2024

codecov bot commented Nov 6, 2024 • edited Loading

Codecov Report

pfebrer commented Nov 7, 2024

zerothi commented Nov 7, 2024

pfebrer commented Nov 7, 2024 • edited Loading

zerothi commented Nov 6, 2024 •

edited

Loading

codecov bot commented Nov 6, 2024 •

edited

Loading

pfebrer commented Nov 7, 2024 •

edited

Loading