Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Device Implementation of HEEVD #42

Merged
merged 7 commits into from
Sep 9, 2023

Conversation

wavefunction91
Copy link
Contributor

@wavefunction91 wavefunction91 commented Aug 25, 2023

This change was required for tighter integration with NWChemEx.

Adds the following:

  • Device API for heevd
  • cuSolver implementation of heevd
  • rocSolver implementation of heevd
  • Device stubs for heevd
  • Unit test for device heevd

Opening early to coordinate if necessary. I have access to OLCF to add rocSolver, but I'd have to update my credentials with ALCF to get access to Intel/oneMKL HW. Also, I do not have access to a pre-11 CUDA installation, so the path for manual dispatch is untested. Unit tests on NVIDIA work locally and at NERSC

$ nvidia-smi
Fri Aug 25 16:27:53 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A5000    On   | 00000000:81:00.0 Off |                  Off |
| 30%   34C    P8    20W / 230W |     73MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
$ ./tester  --type s,d,c,z --align 32 --dim 100:500:100 --jobz n,v --uplo l,u dev-heevd
LAPACK++ version 2023.06.00, id 082daf3
input: ./tester --type 's,d,c,z' --align 32 --dim '100:500:100' --jobz 'n,v' --uplo 'l,u' dev-heevd
                                                                                                 
type    uplo   jobz       n  align  device     error    error2   time (s)  ref time (s)  status  
test matrix A: rand, cond(S) = NA
   s   lower  novec     100     32       0        NA  3.11e-07   0.000832       0.00338  pass    
   s   lower  novec     200     32       0        NA  5.21e-07    0.00163       0.00887  pass    
   s   lower  novec     300     32       0        NA  5.03e-07    0.00258        0.0185  pass    
   s   lower  novec     400     32       0        NA  6.54e-07    0.00343        0.0172  pass    
   s   lower  novec     500     32       0        NA  5.67e-07    0.00433        0.0236  pass    
   s   lower    vec     100     32       0  5.33e-09  5.56e-07    0.00135        0.0114  pass    
   s   lower    vec     200     32       0  1.66e-09  4.06e-07    0.00196        0.0825  pass    
   s   lower    vec     300     32       0  2.07e-09  1.15e-06    0.00308         0.182  pass    
   s   lower    vec     400     32       0  1.64e-09  4.22e-07    0.00414         0.364  pass    
   s   lower    vec     500     32       0  1.33e-09  9.87e-07    0.00522         0.668  pass    
   s   upper  novec     100     32       0        NA  3.53e-07   0.000774      0.000851  pass    
   s   upper  novec     200     32       0        NA  6.53e-07    0.00164       0.00292  pass    
   s   upper  novec     300     32       0        NA  5.33e-07    0.00263        0.0265  pass    
   s   upper  novec     400     32       0        NA  5.33e-07    0.00348        0.0160  pass    
   s   upper  novec     500     32       0        NA  4.97e-07    0.00432        0.0255  pass    
   s   upper    vec     100     32       0  3.33e-09  3.11e-07    0.00698        0.0140  pass    
   s   upper    vec     200     32       0  2.07e-09  5.25e-07    0.00199        0.0695  pass    
   s   upper    vec     300     32       0  1.80e-09  1.15e-06    0.00323         0.179  pass    
   s   upper    vec     400     32       0  1.67e-09  4.51e-07    0.00418         0.364  pass    
   s   upper    vec     500     32       0  1.36e-09  5.80e-07    0.00526         0.666  pass    

   d   lower  novec     100     32       0        NA  7.81e-16    0.00378       0.00115  pass    
   d   lower  novec     200     32       0        NA  1.93e-15    0.00637       0.00603  pass    
   d   lower  novec     300     32       0        NA  1.17e-15    0.00941        0.0205  pass    
   d   lower  novec     400     32       0        NA  1.23e-15     0.0118        0.0377  pass    
   d   lower  novec     500     32       0        NA  1.52e-15     0.0162        0.0329  pass    
   d   lower    vec     100     32       0  9.08e-18  1.09e-15     0.0102        0.0209  pass    
   d   lower    vec     200     32       0  3.55e-18  1.70e-15    0.00709         0.106  pass    
   d   lower    vec     300     32       0  3.24e-18  1.09e-15     0.0106         0.219  pass    
   d   lower    vec     400     32       0  3.29e-18  9.95e-16     0.0139         0.420  pass    
   d   lower    vec     500     32       0  9.21e-19  1.04e-15     0.0173         0.771  pass    
   d   upper  novec     100     32       0        NA  8.21e-16    0.00376      0.000935  pass    
   d   upper  novec     200     32       0        NA  8.95e-16    0.00645       0.00356  pass    
   d   upper  novec     300     32       0        NA  1.03e-15    0.00944        0.0270  pass    
   d   upper  novec     400     32       0        NA  1.59e-15     0.0122        0.0217  pass    
   d   upper  novec     500     32       0        NA  9.22e-16     0.0143        0.0307  pass    
   d   upper    vec     100     32       0  2.23e-18  6.53e-16    0.00844        0.0312  pass    
   d   upper    vec     200     32       0  2.18e-18  9.50e-16    0.00719        0.0688  pass    
   d   upper    vec     300     32       0  3.46e-18  1.00e-15     0.0111         0.202  pass    
   d   upper    vec     400     32       0  1.28e-18  1.43e-15     0.0142         0.447  pass    
   d   upper    vec     500     32       0  8.42e-19  1.20e-15     0.0173         0.800  pass    

   c   lower  novec     100     32       0        NA  2.55e-07   0.000909       0.00151  pass    
   c   lower  novec     200     32       0        NA  4.40e-07    0.00182       0.00554  pass    
   c   lower  novec     300     32       0        NA  6.52e-07    0.00707        0.0362  pass    
   c   lower  novec     400     32       0        NA  7.08e-07    0.00380        0.0503  pass    
   c   lower  novec     500     32       0        NA  5.64e-07    0.00478        0.0610  pass    
   c   lower    vec     100     32       0  4.91e-09  3.61e-07    0.00152        0.0237  pass    
   c   lower    vec     200     32       0  2.86e-09  4.47e-07    0.00220         0.123  pass    
   c   lower    vec     300     32       0  1.46e-09  1.26e-06    0.00347         0.293  pass    
   c   lower    vec     400     32       0  1.67e-09  6.48e-07    0.00460         0.621  pass    
   c   lower    vec     500     32       0  1.31e-09  7.96e-07    0.00586         1.159  pass    
   c   upper  novec     100     32       0        NA  3.17e-07   0.000943       0.00125  pass    
   c   upper  novec     200     32       0        NA  4.96e-07    0.00187        0.0242  pass    
   c   upper  novec     300     32       0        NA  7.21e-07    0.00292        0.0158  pass    
   c   upper  novec     400     32       0        NA  5.97e-07    0.00387        0.0801  pass    
   c   upper  novec     500     32       0        NA  6.24e-07    0.00479        0.0693  pass    
   c   upper    vec     100     32       0  1.28e-09  2.53e-07    0.00154        0.0376  pass    
   c   upper    vec     200     32       0  1.10e-09  6.28e-07    0.00221         0.134  pass    
   c   upper    vec     300     32       0  1.64e-09  4.91e-07    0.00355         0.247  pass    
   c   upper    vec     400     32       0  1.05e-09  5.14e-07    0.00466         0.576  pass    
   c   upper    vec     500     32       0  1.30e-09  4.67e-07    0.00587         1.084  pass    

   z   lower  novec     100     32       0        NA  1.07e-15    0.00409       0.00165  pass    
   z   lower  novec     200     32       0        NA  1.23e-15    0.00801        0.0240  pass    
   z   lower  novec     300     32       0        NA  1.29e-15     0.0126        0.0391  pass    
   z   lower  novec     400     32       0        NA  1.36e-15     0.0170        0.0479  pass    
   z   lower  novec     500     32       0        NA  1.47e-15     0.0218        0.0553  pass    
   z   lower    vec     100     32       0  1.04e-17  5.77e-16    0.00560        0.0407  pass    
   z   lower    vec     200     32       0  4.95e-18  9.64e-16     0.0127         0.154  pass    
   z   lower    vec     300     32       0  3.53e-18  1.02e-15     0.0157         0.365  pass    
   z   lower    vec     400     32       0  2.22e-18  1.04e-15     0.0219         0.786  pass    
   z   lower    vec     500     32       0  1.53e-18  1.15e-15     0.0290         1.451  pass    
   z   upper  novec     100     32       0        NA  4.35e-16    0.00406       0.00148  pass    
   z   upper  novec     200     32       0        NA  1.16e-15    0.00805       0.00836  pass    
   z   upper  novec     300     32       0        NA  1.18e-15     0.0128        0.0249  pass    
   z   upper  novec     400     32       0        NA  8.89e-16     0.0172        0.0389  pass    
   z   upper  novec     500     32       0        NA  1.28e-15     0.0219        0.0540  pass    
   z   upper    vec     100     32       0  3.08e-18  7.00e-16    0.00569        0.0353  pass    
   z   upper    vec     200     32       0  5.35e-18  1.00e-15    0.00974         0.161  pass    
   z   upper    vec     300     32       0  2.13e-18  1.90e-15     0.0161         0.389  pass    
   z   upper    vec     400     32       0  8.77e-19  8.53e-16     0.0224         0.778  pass    
   z   upper    vec     500     32       0  2.03e-18  1.18e-15     0.0288         1.445  pass    
All tests passed for dev-heevd.

@mgates3
Copy link
Collaborator

mgates3 commented Aug 26, 2023

Did you add src/cuda/cuda_heevd.cc and test/test_heevd_device.cc?

@wavefunction91
Copy link
Contributor Author

wavefunction91 commented Aug 26, 2023 via email

@mgates3
Copy link
Collaborator

mgates3 commented Sep 1, 2023

@wavefunction91 Do you want us to add the ROCm and oneMKL implementations? That should be straightforward for us. In that case:
@dsukkari Can you add ROCm?
@ayarkhan Can you add SYCL?
I'm not sure the best route to collaborate on a branch. Would it be easiest to pull the branch into the lapackpp repo and add commits there, with a new PR?

@wavefunction91
Copy link
Contributor Author

@mgates3 It would be better if you could add them. This branch should allow for edits by maintainers, i.e. you @dsukkari and @ayarkhan should all be able to push to it.

@ayarkhan
Copy link
Contributor

ayarkhan commented Sep 5, 2023

For oneMKL/SYCL, the CMake build was tested and works.
On sunspot, the quick tester.

./test/tester dev-heevd
LAPACK++ version 2023.06.00, id 260e406
¯input: ./test/tester dev-heevd

type    uplo   jobz       n  device     error    error2   time (s)  ref time (s)  status
test matrix A: rand, cond(S) = NA
   d   lower  novec     100       0        NA  0.00e+00      0.672         0.117  pass
   d   lower  novec     200       0        NA  0.00e+00     0.0100       0.00377  pass
   d   lower  novec     300       0        NA  0.00e+00    0.00889       0.00999  pass
   d   lower  novec     400       0        NA  0.00e+00     0.0222        0.0196  pass
   d   lower  novec     500       0        NA  0.00e+00     0.0376        0.0348  pass
All tests passed for dev-heevd.

More detailed testing

$ ./tester  --type s,d,c,z --align 32 --dim 100:500:100 --jobz n,v --uplo l,u dev-heevd
LAPACK++ version 2023.06.00, id 260e406
input: ./tester --type 's,d,c,z' --align 32 --dim '100:500:100' --jobz 'n,v' --uplo 'l,u' dev-heevd

type    uplo   jobz       n  align  device     error    error2   time (s)  ref time (s)  status
test matrix A: rand, cond(S) = NA
   s   lower  novec     100     32       0        NA  0.00e+00      0.487         0.121  pass
   s   lower  novec     200     32       0        NA  0.00e+00     0.0307        0.0170  pass
   s   lower  novec     300     32       0        NA  0.00e+00     0.0371        0.0285  pass
   s   lower  novec     400     32       0        NA  0.00e+00     0.0297        0.0285  pass
   s   lower  novec     500     32       0        NA  0.00e+00     0.0470        0.0295  pass
   s   lower    vec     100     32       0  4.66e-09  7.72e-07      0.244         0.147  pass
   s   lower    vec     200     32       0  4.09e-09  9.99e-07     0.0928        0.0219  pass
   s   lower    vec     300     32       0  9.41e-10  7.01e-07    0.00948        0.0264  pass
   s   lower    vec     400     32       0  1.19e-09  7.10e-07     0.0163        0.0444  pass
   s   lower    vec     500     32       0  6.51e-10  2.39e-06     0.0256         0.115  pass
   s   upper  novec     100     32       0        NA  0.00e+00   0.000984      0.000439  pass
   s   upper  novec     200     32       0        NA  0.00e+00    0.00278       0.00268  pass
   s   upper  novec     300     32       0        NA  0.00e+00    0.00722       0.00558  pass
   s   upper  novec     400     32       0        NA  0.00e+00     0.0162        0.0162  pass
   s   upper  novec     500     32       0        NA  0.00e+00     0.0691        0.0104  pass
   s   upper    vec     100     32       0  2.88e-09  3.42e-07    0.00465        0.0600  pass
   s   upper    vec     200     32       0  3.50e-09  5.41e-07    0.00523       0.00967  pass
   s   upper    vec     300     32       0  8.87e-10  4.82e-07    0.00613        0.0181  pass
   s   upper    vec     400     32       0  6.99e-10  6.10e-07     0.0102        0.0327  pass
   s   upper    vec     500     32       0  1.40e-09  5.20e-07     0.0192        0.0512  pass

   d   lower  novec     100     32       0        NA  0.00e+00      0.288        0.0813  pass
   d   lower  novec     200     32       0        NA  0.00e+00    0.00558       0.00323  pass
   d   lower  novec     300     32       0        NA  0.00e+00    0.00779       0.00613  pass
   d   lower  novec     400     32       0        NA  0.00e+00     0.0150        0.0102  pass
   d   lower  novec     500     32       0        NA  0.00e+00     0.0548        0.0180  pass
   d   lower    vec     100     32       0  7.38e-18  5.72e-16      0.204         0.172  pass
   d   lower    vec     200     32       0  2.36e-18  1.14e-15    0.00926        0.0225  pass
   d   lower    vec     300     32       0  3.36e-18  2.95e-15     0.0102        0.0281  pass
   d   lower    vec     400     32       0  1.26e-18  9.20e-16     0.0220        0.0433  pass
   d   lower    vec     500     32       0  1.48e-18  1.17e-15     0.0317        0.0566  pass
   d   upper  novec     100     32       0        NA  0.00e+00    0.00151      0.000966  pass
   d   upper  novec     200     32       0        NA  0.00e+00    0.00332       0.00277  pass
   d   upper  novec     300     32       0        NA  0.00e+00     0.0119        0.0109  pass
   d   upper  novec     400     32       0        NA  0.00e+00    0.00899        0.0175  pass
   d   upper  novec     500     32       0        NA  0.00e+00     0.0163        0.0138  pass
   d   upper    vec     100     32       0  7.28e-18  6.82e-16    0.00216        0.0314  pass
   d   upper    vec     200     32       0  4.84e-18  9.96e-16    0.00549       0.00961  pass
   d   upper    vec     300     32       0  1.89e-18  9.14e-16     0.0124        0.0187  pass
   d   upper    vec     400     32       0  1.56e-18  2.34e-15     0.0173        0.0342  pass
   d   upper    vec     500     32       0  1.01e-18  1.02e-15     0.0242        0.0494  pass

   c   lower  novec     100     32       0        NA  0.00e+00      0.142        0.0369  pass
   c   lower  novec     200     32       0        NA  0.00e+00    0.00794       0.00317  pass
   c   lower  novec     300     32       0        NA  0.00e+00    0.00648       0.00621  pass
   c   lower  novec     400     32       0        NA  0.00e+00     0.0761        0.0190  pass
   c   lower  novec     500     32       0        NA  0.00e+00     0.0340        0.0333  pass
   c   lower    vec     100     32       0  2.37e-09  3.03e-07      0.208        0.0141  pass
   c   lower    vec     200     32       0  2.84e-09  3.63e-07     0.0136        0.0200  pass
   c   lower    vec     300     32       0  8.78e-10  5.51e-07     0.0245        0.0359  pass
   c   lower    vec     400     32       0  2.38e-09  6.68e-07     0.0385        0.0562  pass
   c   lower    vec     500     32       0  1.30e-09  1.40e-06     0.0249        0.0831  pass
   c   upper  novec     100     32       0        NA  0.00e+00    0.00182       0.00113  pass
   c   upper  novec     200     32       0        NA  0.00e+00    0.00348       0.00281  pass
   c   upper  novec     300     32       0        NA  0.00e+00    0.00703       0.00614  pass
   c   upper  novec     400     32       0        NA  0.00e+00     0.0202        0.0182  pass
   c   upper  novec     500     32       0        NA  0.00e+00     0.0552        0.0302  pass
   c   upper    vec     100     32       0  5.38e-09  6.36e-07     0.0214       0.00371  pass
   c   upper    vec     200     32       0  1.72e-09  1.21e-06    0.00477        0.0108  pass
   c   upper    vec     300     32       0  1.26e-09  4.94e-07    0.00985        0.0232  pass
   c   upper    vec     400     32       0  5.43e-10  5.24e-07     0.0244        0.0442  pass
   c   upper    vec     500     32       0  5.74e-10  5.44e-07     0.0486        0.0515  pass

   z   lower  novec     100     32       0        NA  0.00e+00     0.0561       0.00741  pass
   z   lower  novec     200     32       0        NA  0.00e+00     0.0385       0.00951  pass
   z   lower  novec     300     32       0        NA  0.00e+00     0.0226        0.0223  pass
   z   lower  novec     400     32       0        NA  0.00e+00     0.0401        0.0367  pass
   z   lower  novec     500     32       0        NA  0.00e+00     0.0584        0.0550  pass
   z   lower    vec     100     32       0  8.49e-18  7.05e-16     0.0310        0.0263  pass
   z   lower    vec     200     32       0  2.45e-18  7.16e-16     0.0195        0.0181  pass
   z   lower    vec     300     32       0  1.82e-18  9.30e-16     0.0296        0.0397  pass
   z   lower    vec     400     32       0  2.06e-18  1.02e-15     0.0486        0.0640  pass
   z   lower    vec     500     32       0  1.11e-18  1.26e-15     0.0613        0.0834  pass
   z   upper  novec     100     32       0        NA  0.00e+00    0.00191       0.00130  pass
   z   upper  novec     200     32       0        NA  0.00e+00    0.00597       0.00463  pass
   z   upper  novec     300     32       0        NA  0.00e+00     0.0107        0.0112  pass
   z   upper  novec     400     32       0        NA  0.00e+00     0.0200        0.0176  pass
   z   upper  novec     500     32       0        NA  0.00e+00     0.0286        0.0258  pass
   z   upper    vec     100     32       0  6.07e-18  7.47e-16    0.00551       0.00865  pass
   z   upper    vec     200     32       0  4.04e-18  1.10e-15     0.0130        0.0193  pass
   z   upper    vec     300     32       0  2.55e-18  2.29e-15     0.0293        0.0410  pass
   z   upper    vec     400     32       0  1.29e-18  9.66e-16     0.0522        0.0395  pass
   z   upper    vec     500     32       0  1.22e-18  1.14e-15     0.0391        0.0656  pass
All tests passed for dev-heevd.

Copy link
Collaborator

@mgates3 mgates3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. A few stylistic things that I'll cleanup and then merge it in.

src/onemkl/onemkl_common.hh Outdated Show resolved Hide resolved
src/onemkl/onemkl_heevd.cc Show resolved Hide resolved
src/onemkl/onemkl_heevd.cc Show resolved Hide resolved
src/rocm/rocm_common.hh Show resolved Hide resolved
src/rocm/rocm_heevd.cc Outdated Show resolved Hide resolved
src/rocm/rocm_heevd.cc Show resolved Hide resolved
test/test_heevd_device.cc Outdated Show resolved Hide resolved
@mgates3 mgates3 marked this pull request as ready for review September 9, 2023 18:16
@mgates3 mgates3 merged commit f1e0dc6 into icl-utk-edu:master Sep 9, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants