Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dgemmt but not dgemm? #12

Open
mohawk2 opened this issue Feb 20, 2022 · 3 comments
Open

dgemmt but not dgemm? #12

mohawk2 opened this issue Feb 20, 2022 · 3 comments

Comments

@mohawk2
Copy link
Contributor

mohawk2 commented Feb 20, 2022

I had a very quick look in src and was surprised that there's a dgemmt but no dgemm. There's no mention of reasoning in coverage.md - is this simply something that's not been done yet?

@elmar-peise
Copy link
Collaborator

dgemm is part of BLAS, not LAPACK. ReLAPACK must be linked with a BLAS implementation, which will provide dgemm.

dgemmt is not part of BLAS but needed by a LAPACK algorithm, so ReLAPACK provides a recursive implementation for it.

@mohawk2
Copy link
Contributor Author

mohawk2 commented Feb 20, 2022

Thank you for your rapid reply! My assumption was that the performance gains from the recursive algorithm over the tuned block algorithms would be equally large in dgemm as in dgemmt. I appreciate that other BLASes provide dgemm, but feeling "greedy" I wondered if ReLAPACK could provide a high-performance dgemm as well? (A superficial reading of the source for your dgemmt made it look as though it wouldn't be a lot of work to make a dgemm as well)

@elmar-peise
Copy link
Collaborator

elmar-peise commented Feb 22, 2022

A ReLAPACK-style recursive dgemm implementation would outperform the reference BLAS, but almost certainly not reach the performance of high-performance BLAS implementations. Such libraries are tuned for specific CPU architectures and cache sizes, and often contain hand-written assembly.
Overall ReLAPACK is built on the assumption that it's linked to an optimized BLAS, which typically performs best for large matrices: Recursion calls BLAS with large sub-problems, which can give better performance than a blocked algorithm's calls with much smaller, fixed panel sizes.

(Yes, it should be simple to implement a recursive dgemm to test this.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants