Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -mfpu=neon if $CC accepts it #358

Closed
wants to merge 1 commit into from
Closed

Conversation

rui314
Copy link
Contributor

@rui314 rui314 commented Oct 30, 2023

This commit fixes a build issue on Raspberry Pi 4A, which uses an arm32 userland with an arm64 kernel.

GCC generally requires this flag to use NEON intrinsics on arm32. On arm64, GCC doesn't recognize this flag, so -mfpu=neon won't be added to the command line.

This commit fixes a build issue on Raspberry Pi 4A, which uses an
arm32 userland with an arm64 kernel.

GCC generally requires this flag to use NEON intrinsics on arm32.
On arm64, GCC doesn't recognize this flag, so `-mfpu=neon` won't be
added to the command line.
@oconnor663
Copy link
Member

@BurningEnlightenment could you review the CMake change?

@BurningEnlightenment
Copy link
Collaborator

No, this is the wrong way to go about this. Not all arm32 CPUs support NEON therefore compiler support doesn't imply that the intended target indeed supports NEON (and since ARM doesn't support runtime CPU feature detection, we can't switch the implementation like we do for x86).

If you, the user, know that your target CPU supports NEON you ought to configure this project with -DBLAKE3_USE_NEON_INTRINSICS=ON -DBLAKE3_CFLAGS_NEON=-mfpu=neon or set these values in your CMake toolchain file.

@rui314
Copy link
Contributor Author

rui314 commented Oct 30, 2023

In this code path, we've already set BLAKE3_USE_NEON to 1 (please refer to the lines just above this change), so that shouldn't be the case, right? This patch adds the compiler flag only when we decided to use NEON intrinsics.

@BurningEnlightenment
Copy link
Collaborator

BurningEnlightenment commented Oct 30, 2023

The condition for non ARMv8 architectures includes the check DEFINED BLAKE3_CFLAGS_NEON (l. 132). The whole if-block is for the case where we are certain that NEON is supported, wanted and we do know how to configure the compiler with it.

EDIT: See #314 for some context on the current design.

@BurningEnlightenment
Copy link
Collaborator

@rui314 do you have a use case where using a toolchain / commandline param is infeasible?

@rui314
Copy link
Contributor Author

rui314 commented Oct 31, 2023

The issue I'm addressing in this pull request is that BLAKE3's C implementation cannot be built out-of-the-box on the default Raspberry Pi OS (32-bit) running on a Raspberry Pi 4A, which is equipped with an ARMv8-A 64-bit processor. So the kernel is 64-bit and the userland is 32-bit. I didn't specify any optional parameters when using CMake (I simply ran cmake ../c followed by make), but the default build fails with the following error:

In file included from /home/ruiu/blake3/c/blake3_neon.c:3:
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h: In function 'add_128':
/usr/lib/gcc/arm-linux-gnueabihf/12/include/arm_neon.h:651:1: error: inlining failed in call to 'always_inline' 'vaddq_u32': target specific option mismatch
  651 | vaddq_u32 (uint32x4_t __a, uint32x4_t __b)
      | ^~~~~~~~~
/home/ruiu/blake3/c/blake3_neon.c:24:10: note: called from here
   24 |   return vaddq_u32(a, b);
      |          ^~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/blake3.dir/build.make:118: CMakeFiles/blake3.dir/blake3_neon.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/blake3.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

I encountered the same error even with cmake -DBLAKE3_USE_NEON_INTRINSICS=0 ../c.

I thought that the correct solution is to add the appropriate option to make GCC recognize NEON intrinsics when they are enabled in CMake. Unlike ARM64 GCC, ARM32 GCC needs -mfpu=neon to recognize NEON intrinsics.

@BurningEnlightenment
Copy link
Collaborator

BurningEnlightenment commented Oct 31, 2023

Ah, I see. The CMAKE_SYSTEM_PROCESSOR check currently bypasses all other checks. So we may need to restructure the condition a bit and provide a default flag configuration for the ARMv8 32bit case. Can you provide the CMakeCache.txt generated in the cmake ../c case? (I dislike check_c_compiler_flag and try to avoid it due to the fact that they can lead to false positive detections with other compilers)

@rui314
Copy link
Contributor Author

rui314 commented Oct 31, 2023

Here you are: https://gist.github.com/rui314/a9f7c9f7f5dda1cc432875c289557cab

@BurningEnlightenment
Copy link
Collaborator

@rui314 can you try the fix in #359?

@rui314
Copy link
Contributor Author

rui314 commented Oct 31, 2023

It worked!

@BurningEnlightenment
Copy link
Collaborator

@oconnor663 looks like we arrived at a solution--you may close this and review/merge #359

@rui314 rui314 closed this Nov 1, 2023
@oconnor663
Copy link
Member

Merged #359.

@BurningEnlightenment
Copy link
Collaborator

BurningEnlightenment commented Nov 4, 2023

@oconnor663 I've reviewed two other CMake PRs which you may merge.

EDIT: Here is the list of reviewed PRs: https://github.com/BLAKE3-team/BLAKE3/pulls?q=is%3Apr+is%3Aopen+review%3Aapproved

@rui314 rui314 deleted the arm32 branch November 6, 2023 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants