Replies: 3 comments 5 replies
-
Yes, Cascade Lake is supported by the SKYLAKEX target and not all BLAS kernels for it are available in AVX512 versions yet. However, if the default compilation on your system resulted in only avx/avx2 flags showing up in the compile commands, your compiler may not be new enough to support AVX512 instructions (in that case, the c_check script automatically adds NO_AVX512=1 to the options for make). Which compiler are you using ? |
Beta Was this translation helpful? Give feedback.
-
Hmm, that's strange, 9.1 should support AVX512 including the corresponding C intrinsics (immintrin.h) that c_check is looking for. So you should be getting an AVX512 build (optimized for Skylake Xeon, but the differences should be marginal) by default. Maybe your assembler (binutils package) is too old for AVX512 - could you check if the Makefile.conf generated during a default build has "NO_AVX512=1" in it ? |
Beta Was this translation helpful? Give feedback.
-
But if you are comparing against MKL, if your code uses many LAPACK functions instead of direct BLAS calls, it could well be that you are getting optimized/parallelized LAPACK functions from MKL too, while most of the LAPACK in OpenBLAS is copied from the unoptimized Reference-LAPACK ("netlib") implementation. |
Beta Was this translation helpful? Give feedback.
-
The reason was that after replacing mkl with openblas, I found that the program was slowing down.
Then I checked the openblas compilation, and I found that openblas v0.3.26 was compiled by default without using avx512.
make CC=gcc FC=gfortran BINARY=64 NO_STATIC=1 BIGNUMA=0
-DVERSION=\"0.3.26\" -msse3 -mssse3 -msse4.1 -mavx -UASMNAME -UASMFNAME -UNAME -UCNAME ,,,
lscpu show me that 6248R support :
avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl ,,,
If I explicitly turn on avx compilation, I have to add some extra compilation flags to make the compilation successful.
make CC=gcc FC=gfortran BINARY=64 NO_STATIC=1 BIGNUMA=0 NO_AVX512=0 COMMON_OPT="-O2 -mavx512f -mavx512dq -mavx512vl -mfma"
However, this results in slightly slower performance than the default compilation.
from wiki, i see :
I don't see Cascade Lake in the target list, so I guess openblas has not maximized the optimization of this chip?
Beta Was this translation helpful? Give feedback.
All reactions