Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tests for the FFT compute shaders #118

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open

Adding tests for the FFT compute shaders #118

wants to merge 31 commits into from

Conversation

Fletterio
Copy link
Contributor

Adds a new 64_FFT test folder used to check that different FFT modules work properly

@@ -64,5 +64,6 @@ if(NBL_BUILD_EXAMPLES)
add_subdirectory(61_UI EXCLUDE_FROM_ALL)
add_subdirectory(62_CAD EXCLUDE_FROM_ALL)
add_subdirectory(62_SchusslerTest EXCLUDE_FROM_ALL)
add_subdirectory(64_FFT EXCLUDE_FROM_ALL)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can take number 11 if you want

Comment on lines 58 to 59
nbl::hlsl::workgroup::FFT<2, false, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor);
//nbl::hlsl::workgroup::FFT<2, true, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need a stateful accessor, and be able to set input as output etc. to swap buffers

64_FFT/main.cpp Outdated
Comment on lines 65 to 85
// this time we load a shader directly from a file
smart_refctd_ptr<IGPUShader> shader;
{
IAssetLoader::SAssetLoadParams lp = {};
lp.logger = m_logger.get();
lp.workingDirectory = ""; // virtual root
auto assetBundle = m_assetMgr->getAsset("app_resources/shader.comp.hlsl", lp);
const auto assets = assetBundle.getContents();
if (assets.empty())
return logFail("Could not load shader!");

// lets go straight from ICPUSpecializedShader to IGPUSpecializedShader
auto source = IAsset::castDown<ICPUShader>(assets[0]);
// The down-cast should not fail!
assert(source);

// this time we skip the use of the asset converter since the ICPUShader->IGPUShader path is quick and simple
shader = m_device->createShader(source.get());
if (!shader)
return logFail("Creation of a GPU Shader to from CPU Shader source failed!");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd dynamically make a shader source with following contents

#define _NBL_HLSL_WORKGROUP_SIZE_ ...whatever when you produce the runtime string...

#include "app_resources/shader.comp.hlsl"

at alternative is createOVerriden but its more C++ code to put together

Comment on lines 47 to 49
nbl::hlsl::workgroup::FFT<2, true, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor);
accessor.workgroupExecutionAndMemoryBarrier();
nbl::hlsl::workgroup::FFT<2, false, input_t>::template __call<Accessor, SharedMemoryAccessor>(accessor, sharedmemAccessor);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you still haven't swapped the input and output buffers

which means the source for your inverse FFT is not the output of the forward FFT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fletterio did you fix this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 34 to 43
struct Accessor
{
void set(uint32_t idx, nbl::hlsl::complex_t<scalar_t> value)
{
vk::RawBufferStore< vector<scalar_t, 2> >(pushConstants.outputAddress + sizeof(vector<scalar_t, 2>) * idx, vector<scalar_t, 2>(value.real(), value.imag()));
vk::RawBufferStore< nbl::hlsl::complex_t<scalar_t> >(pushConstants.outputAddress + sizeof(nbl::hlsl::complex_t<scalar_t>) * idx, value);
}

void get(uint32_t idx, NBL_REF_ARG(nbl::hlsl::complex_t<scalar_t>) value)
{
vector<scalar_t, 2> aux = vk::RawBufferLoad< vector<scalar_t, 2> >(pushConstants.inputAddress + sizeof(vector<scalar_t, 2>) * idx);
value.real(aux.x);
value.imag(aux.y);
value = vk::RawBufferLoad< nbl::hlsl::complex_t<scalar_t> >(pushConstants.inputAddress + sizeof(nbl::hlsl::complex_t<scalar_t>) * idx);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with using the BDA accessor?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok you need separate input and output address... take the DoublePtrAccessor from example 10 counting sort rename it to something better and stick it in Nabla common headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem was SFINAE resolution (atomics weren't templated so you could only instantiate BDA accessors with integral types). If it's been changed I'll switch to that otherwise I'll see about fixing it myself the way you did references with dummy template arguments

[[vk::push_constant]] PushConstantData pushConstants;

// careful: change size according to Scalar type
groupshared uint32_t sharedmem[4 * WorkgroupSize];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a thing like the workgroup scans have for working out the smem size with nbl::hlsl::mpl

uint32_t dataElementCount;
};

#define _NBL_HLSL_WORKGROUP_SIZE_ 64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take the workgroup size as a parameter into your workgroup::FFT template, just how the scans do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Ended up in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants