Utility providers #10346
Replies: 1 comment
-
I gave a presentation (video) about 5 years ago with a general overview of the utility providers (rxm, rxd, shm). It is a bit out of date and doesn't go into hardware-specific details like you're talking like queue pairs, mlx, etc, but it might help a little? The big benefit of using mlx5_0_dgram with rxd would come into effect for large scale workloads. So the idea would be that you would use mlx5_0 with rxm until a certain size, until the resources get strained, and then switch to mlx5_dgram with rxd. Right now, that ability to switch internally doesn't exist in OFI but we're looking at adding that as part of the peer provider enhancements that we're currently using to target intranode offload for shm but it could be expanded to handle rxm+rxd+shm integrated all together. As it is right now, however, there isn't a real use implementation for it and rxd in reality is not optimized (or maintained) to be able to be used performantly. I would recommend just sticking with mlx5_0 + rxm as we have tested that for fairly large scale jobs without hitting the limitation, but stay tuned for rxd improvements and offload! |
Beta Was this translation helpful? Give feedback.
-
Is there a description somewhere of how utility rxm - rxd providers work? The description https://ofiwg.github.io/libfabric/v1.22.0/man/fi_rxm.7.html in the man page is too laconic for me.
Beta Was this translation helpful? Give feedback.
All reactions