Use SIMD instructions to update data pointers. #153

Shark64 · 2024-10-30T18:32:41Z

Hi, I've noticed that even if the hashes code uses vector instructions for the computation the update of the data pointers at the end is still done using scalar code. We can save some time and code size by loading the pointers as a vector, update them all together and store them back.
I've run ``make test'' and it worked without errors on Linux_x86-64 with RocketLake CPU.

Signed-off-by: Nicola Torracca <[email protected]>

Bulat-Ziganshin · 2024-10-30T18:45:12Z

Do you know about latencies of [failed] store-forwarding? Read e.g. https://stackoverflow.com/questions/46135766/can-modern-x86-implementations-store-forward-from-more-than-one-prior-store :

A read that is bigger than the write, or a read that covers both written and unwritten bytes, takes approximately 11 clock cycles extra.

OTOH, I don't know the rest of this code. The better strategy may be storing pointers in a SIMD register permanently and using e.g. PEXTRD to copy them into scalar registers. Or even using VPGATHER where it available: https://stackoverflow.com/questions/21774454/how-are-the-gather-instructions-in-avx2-implemented

Shark64 · 2024-10-31T00:26:04Z

I'm familiar with store-forwarding, but it doesn't look like it should be a problem here. The pointers are updated only once at the end of the function, outside the hot loop. My patch it's mainly to reduce code size, not for performace. Also, as your quote says the store-fowarding fails if your read back a bigger chuck than the older write, so writing them as a single vector should help by having all the data in one or two store buffers.

Bulat-Ziganshin · 2024-10-31T02:38:14Z

you are right, sorry for the noise. i didn't check the code before commenting :(

Shark64 · 2024-10-31T14:58:26Z

you are right, sorry for the noise. i didn't check the code before commenting :(

No worries ;)

pablodelara

Thanks for your PR. I left a comment.

sha1_mb/sha1_mb_x4_avx.asm

Signed-off-by: Nicola Torracca <[email protected]>

pablodelara

One more change needed, thanks

sha512_mb/sha512_mb_x2_avx.asm

Fixed wrong vmov instruction type.

pablodelara · 2024-11-05T09:03:17Z

Thanks for the changes. I will squash the third commit into the first commit once this is merged.

pablodelara · 2024-11-05T19:04:48Z

sha512_mb/sha512_mb_x2_avx.asm

-	mov	[STATE + _data_ptr_sha512 + 0*PTR_SZ], inp0
-	add	inp1, IDX
-	mov [STATE + _data_ptr_sha512 + 1*PTR_SZ], inp1
+	vmovq	xmm0, IDX


After giving an extra thought, I don't think making this change is worth it, for two reasons:
1 - It's replacing 4 instructions with 5 instructions
2 - We are removing AVX code in the next few months (already mentioned it).
So could you drop the changes in the AVX implementations and leave just the other ones?
Many thanks for the work!

Sure, I've just reverted the _avx functions to the old version.

Signed-off-by: Nicola Torracca <[email protected]>

pablodelara · 2024-11-06T14:41:41Z

Thanks for the work @Shark64. This is now merged in mainline.

Use SIMD instructions to update pointers.

29b51f9

Signed-off-by: Nicola Torracca <[email protected]>

pablodelara requested changes Nov 4, 2024

View reviewed changes

sha1_mb/sha1_mb_x4_avx.asm Outdated Show resolved Hide resolved

Use vpunpcklqdq to broadcast scalar register for AVX targets.

6563d48

Signed-off-by: Nicola Torracca <[email protected]>

pablodelara reviewed Nov 4, 2024

View reviewed changes

sha512_mb/sha512_mb_x2_avx.asm Outdated Show resolved Hide resolved

Update sha512_mb_x2_avx.asm

934c640

Fixed wrong vmov instruction type.

pablodelara reviewed Nov 5, 2024

View reviewed changes

Revert changes for _avx functions.

2ccae01

Signed-off-by: Nicola Torracca <[email protected]>

pablodelara closed this Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use SIMD instructions to update data pointers. #153

Use SIMD instructions to update data pointers. #153

Shark64 commented Oct 30, 2024

Bulat-Ziganshin commented Oct 30, 2024 •

edited

Loading

Shark64 commented Oct 31, 2024

Bulat-Ziganshin commented Oct 31, 2024

Shark64 commented Oct 31, 2024

pablodelara left a comment

pablodelara left a comment

pablodelara commented Nov 5, 2024

pablodelara Nov 5, 2024

Shark64 Nov 6, 2024

pablodelara commented Nov 6, 2024

Use SIMD instructions to update data pointers. #153

Use SIMD instructions to update data pointers. #153

Conversation

Shark64 commented Oct 30, 2024

Bulat-Ziganshin commented Oct 30, 2024 • edited Loading

Shark64 commented Oct 31, 2024

Bulat-Ziganshin commented Oct 31, 2024

Shark64 commented Oct 31, 2024

pablodelara left a comment

Choose a reason for hiding this comment

pablodelara left a comment

Choose a reason for hiding this comment

pablodelara commented Nov 5, 2024

pablodelara Nov 5, 2024

Choose a reason for hiding this comment

Shark64 Nov 6, 2024

Choose a reason for hiding this comment

pablodelara commented Nov 6, 2024

Bulat-Ziganshin commented Oct 30, 2024 •

edited

Loading