Replies: 1 comment
-
I'm not sure - most if not all routines should already have separate code paths for simple cases like alpha=1/ beta=0/ stride=1. Granted moving some of the decisions to the compile phase might be faster, but on the other hand we'd have to come up with a bunch of non-standard function names that no other implementation supports, which is probably not ideal from a user standpoint (portability, triubleshooting, benchmarking etc). This was somewhat inevitable with the bfloat16 functions as there was little or no precedent, but at least the naming issue was restricted to the prefix and there was some industry support behind the initial suggestion. |
Beta Was this translation helpful? Give feedback.
-
The BLAS procedures have arguments that determine what kind of computation is done. For example, dgemm can compute A*B, A'B, AB', or A'*B', depending on the values of arguments transa and transb, and it can add a multiple of a matrix to the results. Would it make sense to break up the BLAS subroutines into specialized procedures that do only one kind of calculation? It is faster to resolve things at compile time, and when there are many calls to dgemm for small matrices, it could make a difference in speed.
Beta Was this translation helpful? Give feedback.
All reactions