Replies: 10 comments 21 replies
-
From the slide deck, below are the currently defined new capabilities.
At a bare minimum, I think FI_AUTH_KEY_RING and FI_RECV_AUTH_KEY are needed. Whether or not FI_SOURCE_AUTH_KEY is needed I think is up for debate. The point of FI_AUTH_KEY_RING is to have a single RDM endpoint supporting multiple authorization and reporting which authorization was used for recieve based RDMA operations. This seems to require FI_SOURCE_AUTH_KEY functionality as a part of FI_AUTH_KEY_RING. Thus, FI_SOURCE_AUTH_KEY is really not needed. |
Beta Was this translation helpful? Give feedback.
-
The current API does not define whether a provider supports auth_keys or not. The application simply passes in the auth_key values on domain/endpoint creation. Because the format and size of auth_keys are provider specific, the application needs to inherently know whether it needs to use auth_keys either prior to calling libfabric, or determine it based on the provider attributes (such as provider name). Whether we're talking about 1 key or multiple, I don't see a difference in behavior here. One of the first questions to answer is how an app determines if auth_key(s) are supported and/or required. Following the existing approach relies on some mechanism outside of libfabric. It's possible for the fi_getinfo() query to return a non-zero fi_domain_attr::auth_key_size to indicate support, though that behavior isn't defined by the man pages. It would also be possible to report an fi_domain_attr::auth_key_cnt to indicate if keys are supported and how many. auth_key_cnt could also be applied to the fi_ep_attr. The use of fi_addr_t to convey auth_key data seems ideal. It works across the existing APIs, with minimal to no changes. As mentioned on the call, fi_addr_t could someday be cast to a pointer if we ever need to provide more than 64-bits of data worth of input. A follow up question is whether the use of auth_keys automatically adjust the behavior of existing flags, which can make sense when including the auth_key as part of the fi_addr_t value. For example, FI_SOURCE could automatically return the auth_key as part of the fi_addr_t value. Similarly, FI_DIRECTED_RECV would include the auth_key as part of matching receive buffers with incoming traffic. IMO, this makes things simpler and more consistent for the user, versus cherry-picking which address related fields may apply. With libfabric 2.0, I don't want to have apps specify that they should match on the tag, but not the source address, but do match on the auth_key, but not the comm_key. That gets us back to tag + ignore bits semantics. As for the key ring object, as a brainstorming idea, it is possible to add that to the AV, but as a separate entity from the endpoint addresses. That is, the AV could have separate calls to insert/lookup auth_key values. This avoids needing to allocate a separate key ring object and bind it to the ep. A shared AV could also share the auth_key values between processes. I would go further and define the auth_key values as an index, similar to the fi_addr_t values. There's no need to define symmetric auth_keys tables, as the app can manage whether auth_key entries are the same between different processes or not. Auth_keys could be pre-loaded into an AV, similar to pre-loading ep addresses. |
Beta Was this translation helpful? Give feedback.
-
@soumagne @shefty : Thinking out loud.... do we need consider managing TX and RX auth_keys independently? For example, I believe with DAOS, the are few servers handling client RPC requests. These servers need to operate on all TX and RX auth_keys. If the RPC is then forwarded to other servers, since these servers are not accepting client RPCs, they do not need receive RDMA operations on client auth_key. They only need to be able to initiate operations on the client auth_keys. May be advantageous to prevent these server endpoints from being targets of RDMA operations. Thoughts on this? |
Beta Was this translation helpful? Give feedback.
-
This is not the case with DAOS. an RPC (bulk handle) could be forwarded to other servers (and those servers will then pull data from the original client), but those other servers will still be accepting RPCs directly from clients as well. I don't believe we have a situation where a server will only respond to other servers but not service clients. |
Beta Was this translation helpful? Give feedback.
-
I think the bulk of this discussion has been around limiting using fi_addr_t's to define both EP addr + auth_key. As pointed out, going this approach leads to a natural extension for FI_DIRECT_RECV. Restricting receive buffers to specific auth_key's is a future use-case we want to support. One initial limitation with this approach is how to support FI_ADDR_UNSPEC (as defined today) while restricting the buffer to a specific auth_key. Thoughts on this? |
Beta Was this translation helpful? Give feedback.
-
Strawman API PR: #9319 |
Beta Was this translation helpful? Give feedback.
-
To close this hole completely and take into consideration this new API, we may need 4 new secondary capability flags. FI_AUTH_KEY_DOM: Provider supports setting an auth key at the domain level. All subsequent EPs and MRs will inherit this auth key. FI_AUTH_KEY_EP: Paired with FI_AUTH_KEY_DOM. Provider supports setting an auth key at the EP level. If EP auth key is NULL, domain auth key will be used. This capability is mutually exclusive with FI_AUTH_KEY_AV. FI_AUTH_KEY_MR: Paired with FI_AUTH_KEY_DOM and optionally with FI_AUTH_KEY_AV. Provider supports setting an auth key at the MR level. If FI_MR_ENDPOINT is set, the EP must be configured to support the auth key the MR is being associated with. For example, if the EP type is RDM, FI_AUTH_KEY_MR must be used paired with FI_AUTH_KEY_AV to enable EPs with multiple auth keys. If the MR auth key is NULL and FI_AUTH_KEY_AV is set, the MR will be enabled against all auth keys the EP support. If the MR auth key is NULL and FI_AUTH_KEY_AV is not set, the MR will be enabled against the default EP auth key. If FI_MR_ENDPOINT is not set, the domain must support the auth key. Determining which auth keys the domain support is outside the scope of this document. If the MR auth key is NULL, the MR will be enabled against the default domain auth key. FI_AUTH_KEY_AV: Provider supports inserting auth keys into the AV. Associating an AV with auth keys enables an RDM EP to be bound to multiple auth keys, select outgoing data transfer auth key, and restrict incoming data transfers (excluding RMA/AMO) to specific auth keys. FI_AUTH_KEY_AV can be combined with FI_AUTH_KEY_MR to restrict incoming RMA/AMO data transfers. Initial thoughts? |
Beta Was this translation helpful? Give feedback.
-
just wanted to circle back on that. Should that translate now to simply adding a |
Beta Was this translation helpful? Give feedback.
-
@shefty you stated previously:
@iziemba I see in your PR you described:
I think for garbage collection purposes it would make sense for us to be able to call AV remove on entries that are no longer being used over time since client processes may come and go and they may use different VNIs if they come from different jobs. However in that case the local server EP would remain the same, am I missing something or it sounds like based on your description that it would not be possible to garbage collect / remove past entries unless we'd call fi_close() on the server EP (which we do not do as we have always one EP / domain per server thread for the lifetime of the server). |
Beta Was this translation helpful? Give feedback.
-
Libfabric AV Auth Keys.pdf |
Beta Was this translation helpful? Give feedback.
-
Libfabric Authorization Key Ring.pdf
Following up from OFIWG meeting with the github discussion to further discuss the Libfabric Authorization Key Ring proposal. OFIWG meeting slide deck is attached.
Specific items to be following up on:
Beta Was this translation helpful? Give feedback.
All reactions