Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SIMD instructions to update data pointers. #153

Closed
wants to merge 4 commits into from

Conversation

Shark64
Copy link
Contributor

@Shark64 Shark64 commented Oct 30, 2024

Hi, I've noticed that even if the hashes code uses vector instructions for the computation the update of the data pointers at the end is still done using scalar code. We can save some time and code size by loading the pointers as a vector, update them all together and store them back.
I've run ``make test'' and it worked without errors on Linux_x86-64 with RocketLake CPU.

@Bulat-Ziganshin
Copy link

Bulat-Ziganshin commented Oct 30, 2024

Do you know about latencies of [failed] store-forwarding? Read e.g. https://stackoverflow.com/questions/46135766/can-modern-x86-implementations-store-forward-from-more-than-one-prior-store :

A read that is bigger than the write, or a read that covers both written and unwritten bytes, takes approximately 11 clock cycles extra.

OTOH, I don't know the rest of this code. The better strategy may be storing pointers in a SIMD register permanently and using e.g. PEXTRD to copy them into scalar registers. Or even using VPGATHER where it available: https://stackoverflow.com/questions/21774454/how-are-the-gather-instructions-in-avx2-implemented

@Shark64
Copy link
Contributor Author

Shark64 commented Oct 31, 2024

I'm familiar with store-forwarding, but it doesn't look like it should be a problem here. The pointers are updated only once at the end of the function, outside the hot loop. My patch it's mainly to reduce code size, not for performace. Also, as your quote says the store-fowarding fails if your read back a bigger chuck than the older write, so writing them as a single vector should help by having all the data in one or two store buffers.

@Bulat-Ziganshin
Copy link

you are right, sorry for the noise. i didn't check the code before commenting :(

@Shark64
Copy link
Contributor Author

Shark64 commented Oct 31, 2024

you are right, sorry for the noise. i didn't check the code before commenting :(

No worries ;)

Copy link
Contributor

@pablodelara pablodelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR. I left a comment.

sha1_mb/sha1_mb_x4_avx.asm Outdated Show resolved Hide resolved
Copy link
Contributor

@pablodelara pablodelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more change needed, thanks

sha512_mb/sha512_mb_x2_avx.asm Outdated Show resolved Hide resolved
Fixed wrong vmov instruction type.
@pablodelara
Copy link
Contributor

Thanks for the changes. I will squash the third commit into the first commit once this is merged.

mov [STATE + _data_ptr_sha512 + 0*PTR_SZ], inp0
add inp1, IDX
mov [STATE + _data_ptr_sha512 + 1*PTR_SZ], inp1
vmovq xmm0, IDX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After giving an extra thought, I don't think making this change is worth it, for two reasons:
1 - It's replacing 4 instructions with 5 instructions
2 - We are removing AVX code in the next few months (already mentioned it).
So could you drop the changes in the AVX implementations and leave just the other ones?
Many thanks for the work!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've just reverted the _avx functions to the old version.

Signed-off-by: Nicola Torracca <[email protected]>
@pablodelara
Copy link
Contributor

Thanks for the work @Shark64. This is now merged in mainline.

@pablodelara pablodelara closed this Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants