-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SVE Implementation for Level-1 BLAS Routines #4959
base: develop
Are you sure you want to change the base?
Conversation
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
Hiya, have you tested the impact on Graviton 3/4? |
Thank you very much for this revised PR, I'm looking forward to the HPEC2024 proceedings becoming available. |
|
||
static int rot_kernel_sve(BLASLONG n, FLOAT *x, FLOAT *y, FLOAT c, FLOAT s) | ||
{ | ||
for (int i = 0; i < n; i += SVE_WIDTH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make i a BLASLONG here please, and adjust the casts in the SVE_WHILELT to uint64_t accordingly ?
#define SVE_WIDTH svcntw() | ||
#endif | ||
|
||
static int scal_kernel_sve(int n, FLOAT *x, FLOAT da) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BLASLONG n ?
|
||
static int scal_kernel_sve(int n, FLOAT *x, FLOAT da) | ||
{ | ||
for (int i = 0; i < n; i += SVE_WIDTH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make i a BLASLONG here too, please
{ | ||
for (int i = 0; i < n; i += SVE_WIDTH) | ||
{ | ||
svbool_t pg = SVE_WHILELT(i, n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add uint64_t casts for i and n here please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see kernel/arm/scal.c for letting "dummy2" decide whether to propagate NaN and Inf values - probably there is a more elegant solution than what I put there, otherwise just copy that file
{ | ||
svbool_t pg = SVE_WHILELT(i, n); | ||
SVE_TYPE x_vec = svld1(pg, &x[i]); | ||
SVE_TYPE result = svmul_z(pg, x_vec, da); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually unsure here if svmul_z will "do the right thing" concerning NaN or Inf arguments in x_vec or da
Thank You @martin-frbg for suggestions. we will do the required modifications. |
We have optimized Level-1 BLAS routines (scal, swap, and rot) utilizing ARM SVE, resulting in significant performance enhancements in OpenBLAS on two variants of the A64FX—FUJITSU PRIMEHPC FX700 and the FUGAKU supercomputer. These optimizations achieved performance improvements ranging from 1.80x to 4x through effective code vectorization. This research has been accepted as a full paper and presented at the 28th Annual IEEE High Performance Extreme Computing (HPEC) Conference in September 2024, under the title "Optimization Strategies to Accelerate BLAS Operations with ARM SVE."