-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding tests for the FFT compute shaders #118
base: master
Are you sure you want to change the base?
Conversation
@@ -64,5 +64,6 @@ if(NBL_BUILD_EXAMPLES) | |||
add_subdirectory(61_UI EXCLUDE_FROM_ALL) | |||
add_subdirectory(62_CAD EXCLUDE_FROM_ALL) | |||
add_subdirectory(62_SchusslerTest EXCLUDE_FROM_ALL) | |||
add_subdirectory(64_FFT EXCLUDE_FROM_ALL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can take number 11 if you want
nbl::hlsl::workgroup::FFT<2, false, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor); | ||
//nbl::hlsl::workgroup::FFT<2, true, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need a stateful accessor, and be able to set input as output etc. to swap buffers
64_FFT/main.cpp
Outdated
// this time we load a shader directly from a file | ||
smart_refctd_ptr<IGPUShader> shader; | ||
{ | ||
IAssetLoader::SAssetLoadParams lp = {}; | ||
lp.logger = m_logger.get(); | ||
lp.workingDirectory = ""; // virtual root | ||
auto assetBundle = m_assetMgr->getAsset("app_resources/shader.comp.hlsl", lp); | ||
const auto assets = assetBundle.getContents(); | ||
if (assets.empty()) | ||
return logFail("Could not load shader!"); | ||
|
||
// lets go straight from ICPUSpecializedShader to IGPUSpecializedShader | ||
auto source = IAsset::castDown<ICPUShader>(assets[0]); | ||
// The down-cast should not fail! | ||
assert(source); | ||
|
||
// this time we skip the use of the asset converter since the ICPUShader->IGPUShader path is quick and simple | ||
shader = m_device->createShader(source.get()); | ||
if (!shader) | ||
return logFail("Creation of a GPU Shader to from CPU Shader source failed!"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd dynamically make a shader source with following contents
#define _NBL_HLSL_WORKGROUP_SIZE_ ...whatever when you produce the runtime string...
#include "app_resources/shader.comp.hlsl"
at alternative is createOVerriden
but its more C++ code to put together
nbl::hlsl::workgroup::FFT<2, true, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor); | ||
accessor.workgroupExecutionAndMemoryBarrier(); | ||
nbl::hlsl::workgroup::FFT<2, false, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you still haven't swapped the input and output buffers
which means the source for your inverse FFT is not the output of the forward FFT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fletterio did you fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct Accessor | ||
{ | ||
void set(uint32_t idx, nbl::hlsl::complex_t<scalar_t> value) | ||
{ | ||
vk::RawBufferStore< vector<scalar_t, 2> >(pushConstants.outputAddress + sizeof(vector<scalar_t, 2>) * idx, vector<scalar_t, 2>(value.real(), value.imag())); | ||
vk::RawBufferStore< nbl::hlsl::complex_t<scalar_t> >(pushConstants.outputAddress + sizeof(nbl::hlsl::complex_t<scalar_t>) * idx, value); | ||
} | ||
|
||
void get(uint32_t idx, NBL_REF_ARG(nbl::hlsl::complex_t<scalar_t>) value) | ||
{ | ||
vector<scalar_t, 2> aux = vk::RawBufferLoad< vector<scalar_t, 2> >(pushConstants.inputAddress + sizeof(vector<scalar_t, 2>) * idx); | ||
value.real(aux.x); | ||
value.imag(aux.y); | ||
value = vk::RawBufferLoad< nbl::hlsl::complex_t<scalar_t> >(pushConstants.inputAddress + sizeof(nbl::hlsl::complex_t<scalar_t>) * idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with using the BDA accessor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok you need separate input and output address... take the DoublePtrAccessor from example 10 counting sort rename it to something better and stick it in Nabla common headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem was SFINAE resolution (atomics weren't templated so you could only instantiate BDA accessors with integral types). If it's been changed I'll switch to that otherwise I'll see about fixing it myself the way you did references with dummy template arguments
[[vk::push_constant]] PushConstantData pushConstants; | ||
|
||
// careful: change size according to Scalar type | ||
groupshared uint32_t sharedmem[4 * WorkgroupSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a thing like the workgroup scans have for working out the smem size with nbl::hlsl::mpl
64_FFT/app_resources/common.hlsl
Outdated
uint32_t dataElementCount; | ||
}; | ||
|
||
#define _NBL_HLSL_WORKGROUP_SIZE_ 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take the workgroup size as a parameter into your workgroup::FFT
template, just how the scans do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Ended up in this PR
Adds a new
64_FFT
test folder used to check that different FFT modules work properly