Replies: 1 comment 1 reply
-
Hi, I've converted this to a discussion, since it's not really a bug report. For general questions, Github Discussions (or Slack or Discourse) are generally better places. The main problem here is that we don't have a device-side printing function right now. Other platforms, like CUDA and oneAPI, offer an I/O intrinsic, but Metal doesn't. So we need to implement our own I/O, which is fairly hard. AMDGPU.jl has it, so we could port that solution, but that will take time. In the mean time, you should use an array to write debug values into. Very clunky, I know, but this isn't unique to Metal.jl (writing kernels in Metal C suffers from the same restriction). To simplify this, it's recommended to incrementally write your kernels. Begin with simple operations, and validate the output at every step. |
Beta Was this translation helpful? Give feedback.
-
First of all, thank you for the awesome package!
How do you print / introspect your kernels?
Eg,
dispatch_simdgroups_per_threadgroup()
orsgitg=simdgroup_index_in_threadgroup()
?I couldn't find any examples in the repo and
@print
doesn't seem to be implemented?Context:
I was hoping to replicate a kernel using SIMD groups (example in .metal, but I cannot figure out the indexing with simdgroup_index_in_threadgroup etc, because I'm not even sure how the grid looks.
Beta Was this translation helpful? Give feedback.
All reactions