You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I realized that one contribution to this could be that clSetKernelArg is stateful, i.e. when you pass a buffer to a kernel, what you do is set the argument of the kernel to that buffer, and then enqueue the kernel. And from that point onward (until the argument is set to something else, or until the kernel object is freed) the kernel keeps the buffer alive.
We could try out two approaches:
Kernel.clone() (CL2.1+ only, unfortunately) right before setting arguments
Resetting arguments to None after passing them.
Both are relatively cheap to try out, just by modifying the invoker generation. To be clear, I don't know that this will have a measurable impact, but it is one possible theory of how memory could survive longer than necessary. (In a way, the issue is a mirror image of #449.) An actually efficient version of option 2 would probably involve some C++, to efficiently call clSetKernelArg(..., nullptr) on a big batch of arguments.
In various projects, some memory allocation "oddities" have popped up:
I realized that one contribution to this could be that
clSetKernelArg
is stateful, i.e. when you pass a buffer to a kernel, what you do is set the argument of the kernel to that buffer, and then enqueue the kernel. And from that point onward (until the argument is set to something else, or until the kernel object is freed) the kernel keeps the buffer alive.We could try out two approaches:
Kernel.clone()
(CL2.1+ only, unfortunately) right before setting argumentsNone
after passing them.Both are relatively cheap to try out, just by modifying the invoker generation. To be clear, I don't know that this will have a measurable impact, but it is one possible theory of how memory could survive longer than necessary. (In a way, the issue is a mirror image of #449.) An actually efficient version of option 2 would probably involve some C++, to efficiently call
clSetKernelArg(..., nullptr)
on a big batch of arguments.cc @matthiasdiener
The text was updated successfully, but these errors were encountered: