All example, including more advanced onces, are shipped within cuFFTDx package.
This folder demonstrates cuFFTDx APIs usage.
- cuFFTDx/MathDx package
- See cuFFTDx requirements
- CMake 3.18 or newer
- Linux system with installed NVIDIA drivers
- NVIDIA GPU of Volta (SM70) or newer architecture
- You may specify
CUFFTDX_CUDA_ARCHITECTURES
to limit CUDA architectures used for compilation (see CMake:CUDA_ARCHITECTURES) mathdx_ROOT
- path to mathDx package (XX.Y - version of the package)
mkdir build && cd build
cmake -DCUFFTDX_CUDA_ARCHITECTURES=70-real -Dmathdx_ROOT=/opt/nvidia/mathdx/XX.Y ..
make
// Run
ctest
For the detailed descriptions of the examples please visit Examples section of the cuFFTDx documentation.
Group | Subgroup | Example | Description |
---|---|---|---|
Introduction Examples | introduction_example | cuFFTDx API introduction | |
Simple FFT Examples | Thread FFT Examples | simple_fft_thread | Complex-to-complex thread FFT |
simple_fft_thread_fp16 | Complex-to-complex thread FFT half-precision | ||
Block FFT Examples | simple_fft_block | Complex-to-complex block FFT | |
simple_fft_block_r2c | Real-to-complex block FFT | ||
simple_fft_block_c2r | Complex-to-real block FFT | ||
simple_fft_block_half2 | Complex-to-complex block FFT with __half2 as data type | ||
simple_fft_block_fp16 | Complex-to-complex block FFT half-precision | ||
simple_fft_block_r2c_fp16 | Real-to-complex block FFT half-precision | ||
simple_fft_block_c2r_fp16 | Complex-to-real block FFT half-precision | ||
Extra Block FFT Examples | simple_fft_block_shared | Complex-to-complex block FFT shared-memory API | |
simple_fft_block_std_complex | Complex-to-complex block FFT with cuda::std::complex as data type | ||
simple_fft_block_cub_io | Complex-to-complex block FFT with CUB used for loading/storing data | ||
NVRTC Examples | nvrtc_fft_thread | Complex-to-complex thread FFT | |
nvrtc_fft_block | Complex-to-complex block FFT | ||
FFT Performance | block_fft_performance | Benchmark for C2C block FFT | |
block_fft_performance_many | Benchmark for C2C/R2C/C2R block FFT | ||
Convolution Examples | convolution | Simplified FFT convolution | |
convolution_r2c_c2r | Simplified R2C-C2R FFT convolution | ||
convolution_performance | Benchmark for FFT convolution using cuFFTDx and cuFFT | ||
2D/3D FFT Advanced Examples | fft_2d | Example showing how to perform 2D FP32 C2C FFT with cuFFTDx | |
fft_2d_r2c_c2r | Example showing how to perform 2D FP32 R2C/C2R convolution with cuFFTDx | ||
fft_2d_single_kernel | 2D FP32 FFT in a single kernel using Cooperative Groups kernel launch | ||
fft_3d_box_single_block | Small 3D FP32 FFT that fits into a single block, each dimension is different | ||
fft_3d_cube_single_block | Small 3D (equal dimensions) FP32 FFT that fits into a single block |