Adding tests for the FFT compute shaders #118

Fletterio · 2024-06-14T18:39:08Z

Adds a new 64_FFT test folder used to check that different FFT modules work properly

64_FFT/app_resources/common.hlsl

devshgraphicsprogramming · 2024-06-14T18:42:28Z

CMakeLists.txt

@@ -64,5 +64,6 @@ if(NBL_BUILD_EXAMPLES)
 	add_subdirectory(61_UI EXCLUDE_FROM_ALL)
 	add_subdirectory(62_CAD EXCLUDE_FROM_ALL)
 	add_subdirectory(62_SchusslerTest EXCLUDE_FROM_ALL)
+	add_subdirectory(64_FFT EXCLUDE_FROM_ALL)


you can take number 11 if you want

64_FFT/app_resources/shader.comp.hlsl

64_FFT/main.cpp

64_FFT/app_resources/common.hlsl

64_FFT/app_resources/shader.comp.hlsl

devshgraphicsprogramming · 2024-06-26T12:30:42Z

64_FFT/app_resources/shader.comp.hlsl

+	nbl::hlsl::workgroup::FFT<2, false, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor);	
+	//nbl::hlsl::workgroup::FFT<2, true, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor);


you need a stateful accessor, and be able to set input as output etc. to swap buffers

devshgraphicsprogramming · 2024-06-26T12:32:12Z

64_FFT/main.cpp

+		// this time we load a shader directly from a file
+		smart_refctd_ptr<IGPUShader> shader;
+		{
+			IAssetLoader::SAssetLoadParams lp = {};
+			lp.logger = m_logger.get();
+			lp.workingDirectory = ""; // virtual root
+			auto assetBundle = m_assetMgr->getAsset("app_resources/shader.comp.hlsl", lp);
+			const auto assets = assetBundle.getContents();
+			if (assets.empty())
+				return logFail("Could not load shader!");
+
+			// lets go straight from ICPUSpecializedShader to IGPUSpecializedShader
+			auto source = IAsset::castDown<ICPUShader>(assets[0]);
+			// The down-cast should not fail!
+			assert(source);
+
+			// this time we skip the use of the asset converter since the ICPUShader->IGPUShader path is quick and simple
+			shader = m_device->createShader(source.get());
+			if (!shader)
+				return logFail("Creation of a GPU Shader to from CPU Shader source failed!");
+		}


I'd dynamically make a shader source with following contents

#define _NBL_HLSL_WORKGROUP_SIZE_ ...whatever when you produce the runtime string... #include "app_resources/shader.comp.hlsl"

at alternative is createOVerriden but its more C++ code to put together

64_FFT/app_resources/shader.comp.hlsl

devshgraphicsprogramming · 2024-06-26T14:52:09Z

64_FFT/app_resources/shader.comp.hlsl

+	nbl::hlsl::workgroup::FFT<2, true, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor);
+	accessor.workgroupExecutionAndMemoryBarrier();
+	nbl::hlsl::workgroup::FFT<2, false, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor);	


you still haven't swapped the input and output buffers

which means the source for your inverse FFT is not the output of the forward FFT

@Fletterio did you fix this?

#118 (comment)

64_FFT/app_resources/shader.comp.hlsl

64_FFT/app_resources/common.hlsl

devshgraphicsprogramming · 2024-08-17T14:54:45Z

64_FFT/app_resources/shader.comp.hlsl

+struct Accessor
+{
 	void set(uint32_t idx, nbl::hlsl::complex_t<scalar_t> value) 
 	{
-		vk::RawBufferStore< vector<scalar_t, 2> >(pushConstants.outputAddress + sizeof(vector<scalar_t, 2>) * idx, vector<scalar_t, 2>(value.real(), value.imag()));
+		vk::RawBufferStore< nbl::hlsl::complex_t<scalar_t> >(pushConstants.outputAddress + sizeof(nbl::hlsl::complex_t<scalar_t>) * idx, value);
 	}

 	void get(uint32_t idx, NBL_REF_ARG(nbl::hlsl::complex_t<scalar_t>) value) 
 	{
-		vector<scalar_t, 2> aux = vk::RawBufferLoad< vector<scalar_t, 2> >(pushConstants.inputAddress + sizeof(vector<scalar_t, 2>) * idx);
-		value.real(aux.x);
-		value.imag(aux.y);
+		value = vk::RawBufferLoad< nbl::hlsl::complex_t<scalar_t> >(pushConstants.inputAddress + sizeof(nbl::hlsl::complex_t<scalar_t>) * idx);


what's wrong with using the BDA accessor?

ah ok you need separate input and output address... take the DoublePtrAccessor from example 10 counting sort rename it to something better and stick it in Nabla common headers.

Problem was SFINAE resolution (atomics weren't templated so you could only instantiate BDA accessors with integral types). If it's been changed I'll switch to that otherwise I'll see about fixing it myself the way you did references with dummy template arguments

64_FFT/app_resources/common.hlsl

devshgraphicsprogramming · 2024-08-24T12:04:24Z

64_FFT/app_resources/shader.comp.hlsl

+[[vk::push_constant]] PushConstantData pushConstants;
+
+// careful: change size according to Scalar type
+groupshared uint32_t sharedmem[4 * WorkgroupSize];


make a thing like the workgroup scans have for working out the smem size with nbl::hlsl::mpl

devshgraphicsprogramming · 2024-08-24T12:05:03Z

64_FFT/app_resources/common.hlsl

+	uint32_t dataElementCount;
+};
+
+#define _NBL_HLSL_WORKGROUP_SIZE_ 64


take the workgroup size as a parameter into your workgroup::FFT template, just how the scans do

Done! Ended up in this PR

…ntrolled

Fletterio added 2 commits June 13, 2024 17:26

New FFT test

cb6e563

Update CMakeLists.txt

fbff6cc

devshgraphicsprogramming reviewed Jun 14, 2024

View reviewed changes

64_FFT/app_resources/common.hlsl Show resolved Hide resolved

devshgraphicsprogramming reviewed Jun 14, 2024

View reviewed changes

64_FFT/app_resources/common.hlsl Outdated Show resolved Hide resolved

devshgraphicsprogramming reviewed Jun 14, 2024

View reviewed changes

64_FFT/app_resources/common.hlsl Outdated Show resolved Hide resolved

devshgraphicsprogramming reviewed Jun 14, 2024

View reviewed changes

64_FFT/app_resources/shader.comp.hlsl Outdated Show resolved Hide resolved

devshgraphicsprogramming reviewed Jun 14, 2024

View reviewed changes

64_FFT/main.cpp Outdated Show resolved Hide resolved

devshgraphicsprogramming reviewed Jun 15, 2024

View reviewed changes

64_FFT/main.cpp Show resolved Hide resolved

devshgraphicsprogramming reviewed Jun 15, 2024

View reviewed changes

64_FFT/main.cpp Outdated Show resolved Hide resolved

Fletterio added 4 commits June 23, 2024 18:34

Changing the test to use the memAdaptor layout for coalesced accesses

82242e9

Layout changes ongoing

edecfca

Refactoring

d5b05ed

Update example

b822d8e