Switch to libxsmm JIT backend #26

sebwolf-de · 2021-03-26T14:53:02Z

Problem: The current approach using libxsmm_gemm_generator can only generate GEMMs with alpha = +/-1. If YATeTo encounters a GEMM with |alpha| != 1, it falls back to default code (nested for loops), which is not performant.
Solution: Use the new libxsmm interface libxsmm_?gemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc);

The text was updated successfully, but these errors were encountered:

uphoffc · 2021-03-26T14:56:05Z

Don't think this would help for alpha:

What is a small matrix multiplication? When characterizing the problem-size by using the M, N, and K parameters, a problem-size suitable for LIBXSMM falls approximately within (M N K)1/3 <= 64 (which illustrates that non-square matrices or even "tall and skinny" shapes are covered as well). The library is typically used to generate code up to the specified threshold. Raising the threshold may not only generate excessive amounts of code (due to unrolling in M or K dimension), but also miss to implement a tiling scheme to effectively utilize the cache hierarchy. For auto-dispatched problem-sizes above the configurable threshold (explicitly JIT'ted code is not subject to the threshold), LIBXSMM is falling back to BLAS. In terms of GEMM, the supported kernels are limited to Alpha := 1, Beta := { 1, 0 }, and TransA := 'N'.

Might be that the doc is out of date, though.

krenzland · 2021-03-26T15:11:40Z

Stupid question, but what is the performance penalty for the JIT backend? It's probably quite small,, as lookup works in O(1) with a hashmap, right?
Actually, you can generate a lookup table inside of yateto, so it's just a function pointer?

We also have to consider this for benchmarking purposes. We do not want to include the compile time in our benchmark (it's init after all), so we should do the JIT phase "Just before" ;)

uphoffc · 2021-03-26T15:18:37Z

Don't think there will be a performance penalty. You can just directly store the function pointer. No one was motivated to code that so far.

sebwolf-de · 2021-03-26T15:29:41Z

libxsmm_dgemm(NULL, NULL, &m, &n, &k, &alpha, &a[0], NULL, &b[0], NULL, &beta, &c[0], NULL); works with alpha != 1

sebwolf-de · 2021-03-26T15:32:00Z

Not completly sure if "LIBXSMM is falling back to BLAS" is only meant for large problem sizes or also for alpha != 1.

uphoffc · 2021-03-26T15:39:01Z

Likely calls MKL.

krenzland · 2021-03-26T16:10:54Z

It's falling back to BLAS:
https://github.com/hfp/libxsmm/blob/835b75563bdfc4c47978e9c70f599f7d5f6cbf11/src/libxsmm_generator.c#L38
https://github.com/hfp/libxsmm/blob/bc1cadad6624819a4393faa3f47ad9dbbfac5105/include/libxsmm_generator.h#L16

hfp · 2021-04-27T07:24:12Z

Thank you for considering transition to LIBXSMM's JIT backend!

Indeed, the stand-alone generator driver outputing inline-assembly C-functions is deprecated and already does not support latest micro architectural extensions (mostly relevant for low/mixed precision).

The discussion here is correct about the choices like managing the function pointers inside of the application, or relying on libxsmm to manage the pointer. For matrix multiplication kernels, kernels are always managed but can be of course tabulated in addition inside of the application. If only a limited set of kernels can be determined upfront, it is best to keep a table of pointers available since any lookup cost can be avoided this way (no matter how small it is). If an application worked with a set of fixed kernels supplied by the deprecated stand-alone generator, one can consider this application to have upfront knowledge and to rely only on a limited set of kernels.

The code registry not only delivers lookup service, but also manages to lifetime of the buffers storing the executable code, and offers kernel introspection, or advanced lookup for custom data (can be used to lookup multiple kernels at once or for totally unrelated data). One can read about code generation, lookup, and caching in this comment.

sebwolf-de self-assigned this Mar 26, 2021

uphoffc closed this as completed Mar 26, 2021

uphoffc reopened this Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to libxsmm JIT backend #26

Switch to libxsmm JIT backend #26

sebwolf-de commented Mar 26, 2021

uphoffc commented Mar 26, 2021

krenzland commented Mar 26, 2021 •

edited

Loading

uphoffc commented Mar 26, 2021

sebwolf-de commented Mar 26, 2021

sebwolf-de commented Mar 26, 2021

uphoffc commented Mar 26, 2021

krenzland commented Mar 26, 2021

hfp commented Apr 27, 2021

Switch to libxsmm JIT backend #26

Switch to libxsmm JIT backend #26

Comments

sebwolf-de commented Mar 26, 2021

uphoffc commented Mar 26, 2021

krenzland commented Mar 26, 2021 • edited Loading

uphoffc commented Mar 26, 2021

sebwolf-de commented Mar 26, 2021

sebwolf-de commented Mar 26, 2021

uphoffc commented Mar 26, 2021

krenzland commented Mar 26, 2021

hfp commented Apr 27, 2021

krenzland commented Mar 26, 2021 •

edited

Loading