-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoexposure example restoration #137
base: master
Are you sure you want to change the base?
Conversation
…Examples-and-Tests into autoexposure_ex
…Examples-and-Tests into autoexposure_ex
…Examples-and-Tests into autoexposure_ex
…Examples-and-Tests into autoexposure_ex
…Examples-and-Tests into autoexposure_ex
float meteringWindowScaleX, meteringWindowScaleY; | ||
float meteringWindowOffsetX, meteringWindowOffsetY; | ||
float lumaMin, lumaMax; | ||
uint32_t sampleCountX, sampleCountY; | ||
uint32_t viewportSizeX, viewportSizeY; | ||
uint64_t lumaMeterBDA; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should pack these up intro struct
s in nbl::hlsl::luma_meter
and friends
P.S. I see you already have, why not use things like nbl::hlsl::luma_meter::LumaMeteringWindow
directly here?
params.shader.entryPoint = "main"; | ||
params.shader.entries = nullptr; | ||
params.shader.requireFullSubgroups = true; | ||
params.shader.requiredSubgroupSize = static_cast<IGPUShader::SSpecInfo::SUBGROUP_SIZE>(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be set from m_physicalDevice->getLimits().maxSubgroupSize
, not just set to 32
26_Autoexposure/main.cpp
Outdated
m_lumaPresentDS[0] = lumaPresentPool->createDescriptorSet(core::smart_refctd_ptr(lumaPresentDSLayout)); | ||
m_lumaPresentDS[1] = lumaPresentPool->createDescriptorSet(core::smart_refctd_ptr(lumaPresentDSLayout)); | ||
if (!m_lumaPresentDS[0] || !m_lumaPresentDS[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you double buffering descriptor sets ? the only reason old example did it was because of round robining the luma meter output SSBOs.
Now we have BDA in Push Cosntants
26_Autoexposure/main.cpp
Outdated
assert(allocation->memory.get() == buffer->getBoundMemory().memory); | ||
}; | ||
|
||
build_buffer(m_device, &m_lumaGatherAllocation, buffer, m_physicalDevice->getLimits().maxSubgroupSize, "Luma Gather Buffer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you seem to have forgotten that there's a sizeof(uint32_t)*
needed on your maxsubgroup size.
// Allocate and Leave 1/4 for image uploads, to test image copy with small memory remaining | ||
{ | ||
uint32_t localOffset = video::StreamingTransientDataBufferMT<>::invalid_value; | ||
uint32_t maxFreeBlock = m_utils->getDefaultUpStreamingBuffer()->max_size(); | ||
const uint32_t allocationAlignment = 64u; | ||
const uint32_t allocationSize = (maxFreeBlock / 4) * 3; | ||
m_utils->getDefaultUpStreamingBuffer()->multi_allocate(std::chrono::steady_clock::now() + std::chrono::microseconds(500u), 1u, &localOffset, &allocationSize, &allocationAlignment); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this, this is test code from ex 24
26_Autoexposure/main.cpp
Outdated
IGPUDescriptorSet::SDescriptorInfo info1 = {}; | ||
info1.info.image.imageLayout = IImage::LAYOUT::READ_ONLY_OPTIMAL; | ||
info1.desc = m_gpuImgView; | ||
|
||
IGPUDescriptorSet::SDescriptorInfo info2 = {}; | ||
info2.info.image.imageLayout = IImage::LAYOUT::READ_ONLY_OPTIMAL; | ||
info2.desc = m_gpuImgView; // FIXME: temporarily pass in input image | ||
|
||
IGPUDescriptorSet::SWriteDescriptorSet writeDescriptors[] = { | ||
{ | ||
.dstSet = m_lumaPresentDS[0].get(), | ||
.binding = 0, | ||
.arrayElement = 0, | ||
.count = 1, | ||
.info = &info1 | ||
}, | ||
{ | ||
.dstSet = m_lumaPresentDS[1].get(), | ||
.binding = 0, | ||
.arrayElement = 0, | ||
.count = 1, | ||
.info = &info2 | ||
} | ||
}; | ||
|
||
m_device->updateDescriptorSets(2, writeDescriptors, 0, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We do a very simple thing, display an image and wait `DisplayImageMs` to show it | ||
inline void workLoopBody() override | ||
{ | ||
const uint32_t SubgroupSize = m_physicalDevice->getLimits().maxSubgroupSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this a member, its painful to see it being initialized multiple times
auto cmdbuf = m_computeCmdBufs[0].get(); | ||
cmdbuf->reset(IGPUCommandBuffer::RESET_FLAGS::NONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd start the example same as the others with the frames in flight (ex24 doesn't do FIF because it displays the same image for a long time), so proper round robining of multiple commandbuffers, blocks and waits before resets
26_Autoexposure/main.cpp
Outdated
.meteringWindowScaleX = MeteringWindowScale[0] * m_gpuImg->getCreationParameters().extent.width, | ||
.meteringWindowScaleY = MeteringWindowScale[1] * m_gpuImg->getCreationParameters().extent.height, | ||
.meteringWindowOffsetX = MeteringWindowOffset[0] * m_gpuImg->getCreationParameters().extent.width, | ||
.meteringWindowOffsetY = MeteringWindowOffset[1] * m_gpuImg->getCreationParameters().extent.height, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you still calling it a scale if its a literal dimension!?
Also if you're using UV normalized coordinates via Sample()
with float coords and explicit lod level, they should stay as they are and you shouldn't care about the input image extents!
26_Autoexposure/main.cpp
Outdated
.sampleCountY = SampleCount[1], | ||
.viewportSizeX = m_gpuImg->getCreationParameters().extent.width, | ||
.viewportSizeY = m_gpuImg->getCreationParameters().extent.height, | ||
.lumaMeterBDA = m_lumaGatherBDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need two addresses to ping-pong between, one that you write current luma data to, and one you clear for a "future" dispatch
First frame's luma dispatch writes to A, clears B
Second frame's luma dispatch writes to B, clears A
Note that its not possible to use the tonemapping shader to clear the same buffer, as one workgroup may still be running while another quits and without an extra atomic (which would have to be cleared) you can't know which workgroup is the "last to quit"
26_Autoexposure/main.cpp
Outdated
cmdbuf->bindComputePipeline(m_lumaMeterPipeline.get()); | ||
cmdbuf->bindDescriptorSets(nbl::asset::EPBP_COMPUTE, m_lumaMeterPipeline->getLayout(), 0, 1, &ds); // also if you created DS Set with 3th index you need to respect it here - firstSet tells you the index of set and count tells you what range from this index it should update, useful if you had 2 DS with lets say set index 2,3, then you can bind both with single call setting firstSet to 2, count to 2 and last argument would be pointet to your DS pointers | ||
cmdbuf->pushConstants(m_lumaMeterPipeline->getLayout(), IShader::E_SHADER_STAGE::ESS_COMPUTE, 0, sizeof(pc), &pc); | ||
cmdbuf->dispatch(1 + (SampleCount[0] - 1) / SubgroupSize, 1 + (SampleCount[1] - 1) / SubgroupSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is backwards, you should get your sample count from workgroup dispatch count, not the other way round
26_Autoexposure/main.cpp
Outdated
m_EV = 0.0f; | ||
for (int index = 0; index < SubgroupSize; index++) { | ||
m_EV += static_cast<float>(buffData[index]) / (log2(LumaMinMax[1]) - log2(LumaMinMax[0])) + log2(LumaMinMax[0]); | ||
} | ||
m_EV /= (SampleCount[0] * SampleCount[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the discord conversation, the per-workgroup-averages need to be in the [0,leftoverBits)
range already
then your decode step is
for ()
ev += static_cast<float>(buffData[index]) * (log2(LumaMinMax[1]) - log2(LumaMinMax[0])) / ((1<<fixedPointBitsLeft)-1) + log2(LumaMinMax[0]);
ev /= float(workgroupCount[0]*workgroupCount[1]);
26_Autoexposure/main.cpp
Outdated
auto pc = AutoexposurePushData | ||
{ | ||
.meteringWindowScaleX = MeteringWindowScale[0] * m_gpuImg->getCreationParameters().extent.width, | ||
.meteringWindowScaleY = MeteringWindowScale[1] * m_gpuImg->getCreationParameters().extent.height, | ||
.meteringWindowOffsetX = MeteringWindowOffset[0] * m_gpuImg->getCreationParameters().extent.width, | ||
.meteringWindowOffsetY = MeteringWindowOffset[1] * m_gpuImg->getCreationParameters().extent.height, | ||
.lumaMin = LumaMinMax[0], | ||
.lumaMax = LumaMinMax[1], | ||
.EV = m_EV, | ||
.sampleCountX = SampleCount[0], | ||
.sampleCountY = SampleCount[1], | ||
.viewportSizeX = m_gpuImg->getCreationParameters().extent.width, | ||
.viewportSizeY = m_gpuImg->getCreationParameters().extent.height, | ||
.lumaMeterBDA = m_lumaGatherBDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're overdeclaring push constants to share the layout between two pipelines, then only build them once
auto queue = getGraphicsQueue(); | ||
auto cmdbuf = m_graphicsCmdBufs[0].get(); | ||
cmdbuf->reset(IGPUCommandBuffer::RESET_FLAGS::NONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// we don't need to wait for the transfer semaphore, because we submit everything to the same queue | ||
const IQueue::SSubmitInfo::SSemaphoreInfo acquired[1] = { { | ||
.semaphore = acquire.semaphore, | ||
.value = acquire.acquireCount, | ||
.stageMask = PIPELINE_STAGE_FLAGS::NONE | ||
} }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to wait for the compute queue semaphore (from the luma metering) as well as the acquire
because eventually you'll compute the EV in the tonemapping pipeline and stop doing any host wait on the compute queue.
26_Autoexposure/main.cpp
Outdated
// Wait for completion | ||
{ | ||
const ISemaphore::SWaitInfo cmdbufDonePending[] = { | ||
{ | ||
.semaphore = m_presentSemaphore.get(), | ||
.value = m_submitIx | ||
} | ||
}; | ||
if (m_device->blockForSemaphores(cmdbufDonePending) != ISemaphore::WAIT_RESULT::SUCCESS) | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time to remove this and do Frames in Flight
constexpr float Key = 0.18; | ||
auto params = ToneMapperClass::Params_t<TMO>(Exposure, Key, 0.85f); | ||
{ | ||
params.setAdaptationFactorFromFrameDelta(0.f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we used this basically
https://en.wikipedia.org/wiki/Exponential_smoothing
to blend the EV value over time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 was a special value to "reset" the exposure (allow it to use the instantaneous value)
…Examples-and-Tests into autoexposure_ex
Whats the status on this? |
No description provided.