Create specialized BLAS routines with fewer arguments? #4076

Beliavsky · 2023-06-07T13:26:27Z

Beliavsky
Jun 7, 2023

The BLAS procedures have arguments that determine what kind of computation is done. For example, dgemm can compute A*B, A'B, AB', or A'*B', depending on the values of arguments transa and transb, and it can add a multiple of a matrix to the results. Would it make sense to break up the BLAS subroutines into specialized procedures that do only one kind of calculation? It is faster to resolve things at compile time, and when there are many calls to dgemm for small matrices, it could make a difference in speed.

martin-frbg · 2023-06-07T20:49:25Z

martin-frbg
Jun 7, 2023
Maintainer

I'm not sure - most if not all routines should already have separate code paths for simple cases like alpha=1/ beta=0/ stride=1. Granted moving some of the decisions to the compile phase might be faster, but on the other hand we'd have to come up with a bunch of non-standard function names that no other implementation supports, which is probably not ideal from a user standpoint (portability, triubleshooting, benchmarking etc). This was somewhat inevitable with the bfloat16 functions as there was little or no precedent, but at least the naming issue was restricted to the prefix and there was some industry support behind the initial suggestion.
I do wonder how much of your proposal could be covered by finalizing xianyi's implementation of the quasi-standard ?GEMM_BATCH instruction though

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create specialized BLAS routines with fewer arguments? #4076

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Create specialized BLAS routines with fewer arguments? #4076

Beliavsky Jun 7, 2023

Replies: 1 comment

martin-frbg Jun 7, 2023 Maintainer

Beliavsky
Jun 7, 2023

martin-frbg
Jun 7, 2023
Maintainer