Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to FindCUDAToolkit.cmake #3004

Closed
wants to merge 5 commits into from

Conversation

TinyTinni
Copy link

@TinyTinni TinyTinni commented Sep 7, 2024

This PR migrates the current implementation from find_package(CUDA) to find_package(CUDAToolkit) .

It removes the old path, most notably the test compilation with CUDA. Here, I am not sure if it follows the project practices, but I am personally not the biggest fan of those test compilation, this is why I omitted them. Please let me know if I should re-add them.

It also adds the OpenMP finder of CMake.

Backward incompatibilities are the required CMake version. This PR changes the note for the current release, which raised the CMake version already to 3.8.0.

This PR is based on and closes #2833 and closely the suggested implementation with some tweaks.

Edit: tested it locally under Windows.

@davisking
Copy link
Owner

It removes the old path, most notably the test compilation with CUDA. Here, I am not sure if it follows the project practices, but I am personally not the biggest fan of those test compilation, this is why I omitted them. Please let me know if I should re-add them.

The stuff like test_for_cuda? Yeah we need to have those. Morally speaking, we shouldn't need them. But you would be very surprised by the number of people who have broken systems and those tests are massively helpful for them. Since they shunt them towards a clear error like "your compiler can't compile a cuda program!". And then dlib automatically builds without cuda, rather than trying to build with cuda and then failing to build at all.

But yeah, this would be awesome to update to assuming we don't find it breaks anything. Updating cmake is often an odyssey :|

@davisking
Copy link
Owner

davisking commented Sep 7, 2024

I just tried this on an Ubuntu 18.04.3 LTS machine with cuda 11 installed and it didn't work. Dlib on master works fine there though. So

cd dlib/test
mkdir build
cmake ..

errors out with this:

-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found suitable version "11.0.167", minimum required is "9.1")
CMake Warning at /home/davis/source/dlib/dlib/CMakeLists.txt:654 (find_package):
  By not providing "FindCUDNN.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "CUDNN", but
  CMake did not find one.

  Could not find a package configuration file provided by "CUDNN" with any of
  the following names:

    CUDNNConfig.cmake
    cudnn-config.cmake

  Add the installation prefix of "CUDNN" to CMAKE_PREFIX_PATH or set
  "CUDNN_DIR" to a directory containing one of the above files.  If "CUDNN"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- DID NOT FIND CUDA
-- Disabling CUDA support for dlib.  DLIB WILL NOT USE CUDA

This is with cmake cmake version 3.29.0-rc1 too

@davisking
Copy link
Owner

None of the github CI builds have cuda machines so you have to test this yourself :(

@TinyTinni
Copy link
Author

TinyTinni commented Sep 7, 2024

Thanks for trying it out under Linux.

I will try to spin up a VM and try to fix the CUDNN finder (I think the pkgconfig is missing).

The stuff like test_for_cuda? Yeah we need to have those.

Sure, those get also re-added.

@TinyTinni
Copy link
Author

This code is much closer to the code in master. The cudnn finder is again the same as the master, so your error should not occur anymore (I wasn't able to set up a ubuntu 18.04 vm with cuda yet).

It mostly replaced CUDA with CUDAToolkit and enables CUDA as a language.

Sadly, currently only tested on windows.

@davisking
Copy link
Owner

Tried it on ubuntu and got this:

-- Building a cuDNN test project to check if you have the right version of cuDNN installed...
-- *****************************************************************************************************
-- *** Found cuDNN, but we failed to compile the dlib/cmake_utils/test_for_cudnn project. 
-- *** You either have an unsupported version of cuDNN or something is wrong with your cudDNN install.
-- *** Since a functional cuDNN is not found DLIB WILL NOT USE CUDA. 
-- *** The output of the failed test_for_cudnn build is: 
-- *** 
-- ***   Change Dir: '/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build'
   ***   
   ***   Run Build Command(s): /usr/local/bin/cmake -E env VERBOSE=1 /usr/bin/make -f Makefile
   ***   /usr/local/bin/cmake -S/home/davis/source/dlib/dlib/cmake_utils/test_for_cudnn -B/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build --check-build-system CMakeFiles/Makefile.cmake 0
   ***   /usr/local/bin/cmake -E cmake_progress_start /home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build/CMakeFiles /home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build//CMakeFiles/progress.marks
   ***   /usr/bin/make  -f CMakeFiles/Makefile2 all
   ***   make[1]: Entering directory '/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build'
   ***   /usr/bin/make  -f CMakeFiles/cudnn_test.dir/build.make CMakeFiles/cudnn_test.dir/depend
   ***   make[2]: Entering directory '/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build'
   ***   cd /home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles" /home/davis/source/dlib/dlib/cmake_utils/test_for_cudnn /home/davis/source/dlib/dlib/cmake_utils/test_for_cudnn /home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build /home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build /home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build/CMakeFiles/cudnn_test.dir/DependInfo.cmake
   ***   Dependee "/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build/CMakeFiles/cudnn_test.dir/DependInfo.cmake" is newer than depender "/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build/CMakeFiles/cudnn_test.dir/depend.internal".
   ***   Dependee "/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build/CMakeFiles/cudnn_test.dir/depend.internal".
   ***   Scanning dependencies of target cudnn_test
   ***   make[2]: Leaving directory '/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build'
   ***   /usr/bin/make  -f CMakeFiles/cudnn_test.dir/build.make CMakeFiles/cudnn_test.dir/build
   ***   make[2]: Entering directory '/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build'
   ***   [ 50%] Building CXX object CMakeFiles/cudnn_test.dir/cudnn_dlibapi.cpp.o
   ***   /usr/bin/c++ -DDLIB_USE_CUDA   -o CMakeFiles/cudnn_test.dir/cudnn_dlibapi.cpp.o -c /home/davis/source/dlib/dlib/cuda/cudnn_dlibapi.cpp
   ***   In file included from /home/davis/source/dlib/dlib/cuda/cudnn_dlibapi.cpp:10:0:
   ***   /usr/include/cudnn.h:56:10: fatal error: cuda_runtime.h: No such file or directory
   ***    #include <cuda_runtime.h>
   ***             ^~~~~~~~~~~~~~~~
   ***   compilation terminated.
   ***   CMakeFiles/cudnn_test.dir/build.make:74: recipe for target 'CMakeFiles/cudnn_test.dir/cudnn_dlibapi.cpp.o' failed
   ***   make[2]: *** [CMakeFiles/cudnn_test.dir/cudnn_dlibapi.cpp.o] Error 1
   ***   make[2]: Leaving directory '/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build'
   ***   CMakeFiles/Makefile2:82: recipe for target 'CMakeFiles/cudnn_test.dir/all' failed
   ***   make[1]: *** [CMakeFiles/cudnn_test.dir/all] Error 2
   ***   make[1]: Leaving directory '/home/davis/source/dlib/dlib/test/build/dlib_build/cudnn_test_build'
   ***   Makefile:90: recipe for target 'all' failed
   ***   make: *** [all] Error 2
   ***   
   ***   
-- *****************************************************************************************************
-- Disabling CUDA support for dlib.  DLIB WILL NOT USE CUDA

@TinyTinni
Copy link
Author

TinyTinni commented Sep 8, 2024

Thanks for retrying it. I think only the cuda runtime was missing on the cudnn test target.

I installed cuda/cudnn on a non nvidia Linux machine, but it failed to enable CUDA in cmake, because it couldn't figure out the architecture version (because, no nvidia gpu, I assume?). Sadly, I couldn't trick cmake to compile anyway and reach the cudnn test part. I really hope it works this time, otherwise I have to figure out something else. I feel really bad that you have to try it so often.

If it fails now, I will not work on it until next weekend, because I will be gone during the next week.

@TinyTinni
Copy link
Author

Got it on Debian 12 with CUDA 12/cudnn 9 and cmake 3.25.1 via WSL

Had to set CUDAToolkit_ROOT as otherwise, it found the windows version.
Thats why I added the flag to the test project.

Output works:

Using CMake version: 3.25.1
Compiling dlib version: 19.24.99
Looking for cuDNN install...
Found cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
Building a CUDA test project to see if your compiler is compatible with CUDA...
Building a cuDNN test project to check if you have the right version of cuDNN installed...
Enabling CUDA support for dlib.  DLIB WILL USE CUDA, compute capabilities: 50
Configuring done

@davisking
Copy link
Owner

Did you build it? Like do make -j6 dtest and then ./dtest --runall

I just tried that and the build failed due to a bunch of missing symbol errors. I.e. make failed, I didn't get to running dtest.

@TinyTinni
Copy link
Author

Did you build it? Like do make -j6 dtest and then ./dtest --runall

Sadly yes, I was able to link and run it successfully.

Here a fresh build with only CUDAToolkit_ROOT defined

[ 45%] Building CUDA object dlib_build/CMakeFiles/dlib.dir/cuda/cuda_dlib.cu.o
[ 45%] Building CXX object dlib_build/CMakeFiles/dlib.dir/cuda/cudnn_dlibapi.cpp.o
[ 45%] Building CXX object dlib_build/CMakeFiles/dlib.dir/cuda/cublas_dlibapi.cpp.o
[ 46%] Building CUDA object dlib_build/CMakeFiles/dlib.dir/cuda/cusolver_dlibapi.cu.o
....
[ 46%] Building CXX object dlib_build/CMakeFiles/dlib.dir/cuda/curand_dlibapi.cpp.o
[ 46%] Building CXX object dlib_build/CMakeFiles/dlib.dir/cuda/cuda_data_ptr.cpp.o
[ 47%] Building CXX object dlib_build/CMakeFiles/dlib.dir/cuda/gpu_data.cpp.o
[ 47%] Linking CXX static library libdlib.a
[ 47%] Built target dlib
...
[ 99%] Building CXX object CMakeFiles/dtest.dir/optional.cpp.o
[ 99%] Building CXX object CMakeFiles/dtest.dir/scope.cpp.o
[100%] Linking CXX executable dtest
[100%] Built target dtest

Did you also try out a fresh build/clear directory? Cleared CMake Cache? Removed build/dlib_build dir?
Can you give me a hint, which libs could not be linked? cudnn or cudart?

@davisking
Copy link
Owner

Huh yeah, must have forgotten to blow away the old cmake build files. Looks like it's working. I'll look this over more in a bit :D

Comment on lines +800 to +805
list (APPEND dlib_needed_private_libraries CUDA::toolkit)
list (APPEND dlib_needed_private_libraries CUDA::cublas)
list (APPEND dlib_needed_private_libraries ${cudnn})
list (APPEND dlib_needed_private_libraries ${CUDA_curand_LIBRARY})
list (APPEND dlib_needed_private_libraries ${CUDA_cusolver_LIBRARY})
list (APPEND dlib_needed_private_libraries ${CUDA_CUDART_LIBRARY})
list (APPEND dlib_needed_private_libraries CUDA::curand)
list (APPEND dlib_needed_private_libraries CUDA::cusolver)
list (APPEND dlib_needed_private_libraries CUDA::cudart)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work when generating pkgconfig files.
If you look at my old PR #2912, which included using FindCUDAToolkit.cmake, I was unable to properly interoperate with pkgconfig due to cmake's use of "blah::blah" when linking libraries. This was my final hurdle but sadly couldn't overcome it. It included a whole bunch of other tidy-ups too... Sad

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, this sounds bad. I wasn't reading all of the PR, because it is way too long. But I assume you reference this comment [1] of yours.

You suggest using the "old" cmake way of using XXX_LIBRARIES and XXX_INCLUDE_DIRS.
Given the result variables from the new finder [2], the major problem seems to be the XXX_LIBRARIES variable, which is not set. I was also browsing the cmake source code of the finder and there isn't even a private one.

Did you try out to get all the link libraries via a target property? [3] Maybe creating the XXX_LIBRARIES manually by extracting all the libraries from the LINK_LIBRARIES [4] property is fine.

[1] #2912 (comment)
[2] https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html#result-variables
[3] get_target_property: https://cmake.org/cmake/help/latest/command/get_target_property.html
[4] LINK_LIBRARIES: https://cmake.org/cmake/help/latest/prop_tgt/LINK_LIBRARIES.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried extracting all the link libraries but it isn't trivial. There are some old cmake libraries that attempt to do this but they're not maintained. Personally my preference would be to drop support for pkgconfig then this problem goes away. There is a ticket open https://gitlab.kitware.com/cmake/cmake/-/issues/22621 that discusses adding support for exporting pkgconfig files natively. But nothing's happened. Realistically, how many users are using dlib via pkgconfig? My guess is not very many.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with the link libraries is that you have to do it recursively for all dependencies. It can get a bit nasty.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this sounds at least like a solution if pkgconfig cannot be dropped.

Long-term wise, a solution is probably needed anyway, as CMake linking on targets will very likely stay.

@dlib-issue-bot
Copy link
Collaborator

Notice: this issue has been closed because it has been inactive for 51 days. You may reopen this issue if it has been closed in error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to FindCUDAToolkit.cmake
4 participants