Libfabric Authorization Key Ring Proposal #9204

iziemba · 2023-08-08T18:51:15Z

iziemba
Aug 8, 2023
Collaborator

Following up from OFIWG meeting with the github discussion to further discuss the Libfabric Authorization Key Ring proposal. OFIWG meeting slide deck is attached.

Specific items to be following up on:

Can the proposed new capabilities be simplified
The API for using the fi_auth_key_t's should favor a solution with the minimal set of API changes

iziemba · 2023-08-08T19:05:30Z

iziemba
Aug 8, 2023
Collaborator Author

From the slide deck, below are the currently defined new capabilities.

FI_AUTH_KEY_RING
- A provider supports binding an authorization key ring to an endpoint
- If capability is set during endpoint creation, the authorization key in the endpoint fi_info will be ignored
- Requires providers to support changing authorization key per transmit-based RDMA operation
- Requires providers to support an endpoint receiving on one or more authorization keys
  - Authorization key ring would define the exact number an endpoint would need to support
FI_SOURCE_AUTH_KEY
- Paired with FI_AUTH_KEY_RING
- Requests that the endpoint return the source authorization key (fi_auth_key_t) data as part of its completion data
FI_NO_SOURCE_AUTH_KEY
- Paired with FI_SOURCE_AUTH_KEY
- Flag used per MR and recv/trecv to signal to provider that corresponding completion events do not need to return source authorization key data
- Optimization to avoid potential reverse lookup to retrieve fi_auth_key_t
FI_RECV_AUTH_KEY
- Paired with FI_AUTH_KEY_RING
- Provider supports restricting a recv/trecv to a specific fi_auth_key_t

At a bare minimum, I think FI_AUTH_KEY_RING and FI_RECV_AUTH_KEY are needed. Whether or not FI_SOURCE_AUTH_KEY is needed I think is up for debate. The point of FI_AUTH_KEY_RING is to have a single RDM endpoint supporting multiple authorization and reporting which authorization was used for recieve based RDMA operations. This seems to require FI_SOURCE_AUTH_KEY functionality as a part of FI_AUTH_KEY_RING. Thus, FI_SOURCE_AUTH_KEY is really not needed.

0 replies

shefty · 2023-08-08T20:11:35Z

shefty
Aug 8, 2023
Maintainer

The current API does not define whether a provider supports auth_keys or not. The application simply passes in the auth_key values on domain/endpoint creation. Because the format and size of auth_keys are provider specific, the application needs to inherently know whether it needs to use auth_keys either prior to calling libfabric, or determine it based on the provider attributes (such as provider name). Whether we're talking about 1 key or multiple, I don't see a difference in behavior here.

One of the first questions to answer is how an app determines if auth_key(s) are supported and/or required. Following the existing approach relies on some mechanism outside of libfabric. It's possible for the fi_getinfo() query to return a non-zero fi_domain_attr::auth_key_size to indicate support, though that behavior isn't defined by the man pages. It would also be possible to report an fi_domain_attr::auth_key_cnt to indicate if keys are supported and how many. auth_key_cnt could also be applied to the fi_ep_attr.

The use of fi_addr_t to convey auth_key data seems ideal. It works across the existing APIs, with minimal to no changes. As mentioned on the call, fi_addr_t could someday be cast to a pointer if we ever need to provide more than 64-bits of data worth of input.

A follow up question is whether the use of auth_keys automatically adjust the behavior of existing flags, which can make sense when including the auth_key as part of the fi_addr_t value. For example, FI_SOURCE could automatically return the auth_key as part of the fi_addr_t value. Similarly, FI_DIRECTED_RECV would include the auth_key as part of matching receive buffers with incoming traffic. IMO, this makes things simpler and more consistent for the user, versus cherry-picking which address related fields may apply. With libfabric 2.0, I don't want to have apps specify that they should match on the tag, but not the source address, but do match on the auth_key, but not the comm_key. That gets us back to tag + ignore bits semantics.

As for the key ring object, as a brainstorming idea, it is possible to add that to the AV, but as a separate entity from the endpoint addresses. That is, the AV could have separate calls to insert/lookup auth_key values. This avoids needing to allocate a separate key ring object and bind it to the ep. A shared AV could also share the auth_key values between processes. I would go further and define the auth_key values as an index, similar to the fi_addr_t values. There's no need to define symmetric auth_keys tables, as the app can manage whether auth_key entries are the same between different processes or not. Auth_keys could be pre-loaded into an AV, similar to pre-loading ep addresses.

10 replies

shefty Sep 11, 2023
Maintainer

It's not prevented. The trade-offs are around API simplicity/usability, optimizing for the common/expected use case, and what can actually be implemented.

iziemba Sep 11, 2023
Collaborator Author

I'll open up a strawman API PR based on this feedback.

iziemba Sep 11, 2023
Collaborator Author

One thing we need this API to support is the ability for DAOS servers to pass the client auth_key among themselves. For example, a DAOS I/O forwarder may do initial client RPC processing and then shard the RPC to multiple servers. In the original proposal, the I/O forwarder would send the fi_auth_key_t to all servers. Then, all servers would have to use this fi_auth_handle_t to move data from/to the client.

This approach only works if all I/O forwarders and servers have the same view of fi_auth_key_t (FI_AUTH_KEY_RING_SYMMETRIC). Assuming I followed this thread correctly, since auth_key will always be encoded in the fi_add_t (resulting if fi_auth_key_t never existing), how can the I/O forwarders send the auth_key to the servers?

One option would be to do the following:

I/O forwarder: Call fi_av_lookup_auth_key() to exact auth_key blob. Send this blob + EP address blob to the servers.
Servers: Insert auth_key and EP addr blob in AV to get fi_addr_t.
Servers: Use auth_key + EP addr fi_addr_t to issue RDMA operations.

Thoughts?

shefty Sep 11, 2023
Maintainer

I assume the same semantics used for the client's fi_addr_t would apply. Each server has their own AV and must convey some sort of address between servers, so that the server can retrieve the correct fi_addr_t value to talk with each client. There's no need for the auth_key index values at different servers to match (though that makes it much easier), just like they don't have to have the same fi_addr_t value for the same client.

Having an auth_key lookup, similar to the AV address lookup, makes sense from a usability perspective. DAOS could use that or rely on some other mechanism, such as simply referencing the auth_key by index, assuming that all servers have the same set of auth_key values.

soumagne Sep 11, 2023

@iziemba that's a very good point, thanks for bringing that up. In those cases yes we also have to sometimes serialize addresses based on their native address format. So we'd need to be able to serialize the auth_key separately and embed it along with the native address. Your description and usage of fi_av_lookup_auth_key() seems good to me.
We never exchange any fi_addr_t directly since right we can't expect the indexes to be the same.

iziemba · 2023-09-11T17:23:37Z

iziemba
Sep 11, 2023
Collaborator Author

@soumagne @shefty : Thinking out loud.... do we need consider managing TX and RX auth_keys independently? For example, I believe with DAOS, the are few servers handling client RPC requests. These servers need to operate on all TX and RX auth_keys. If the RPC is then forwarded to other servers, since these servers are not accepting client RPCs, they do not need receive RDMA operations on client auth_key. They only need to be able to initiate operations on the client auth_keys. May be advantageous to prevent these server endpoints from being targets of RDMA operations. Thoughts on this?

3 replies

shefty Sep 11, 2023
Maintainer

We could associate access flags with each auth_key (FI_SEND | FI_RECV | FI_WRITE | FI_REMOTE_WRITE \ etc.). That doesn't complicate the API much, but would add the desired protection, which I agree could be useful.

iziemba Sep 11, 2023
Collaborator Author

👍 I'll add a flags args to fi_av_insert_auth_key() which could support this.

soumagne Sep 11, 2023

I would need to think more how we would use this in practice but I agree also that in theory it's a nice to have.

frostedcmos · 2023-09-11T23:24:20Z

frostedcmos
Sep 11, 2023

If the RPC is then forwarded to other servers, since these servers are not accepting client RPCs, they do not need receive RDMA operations on client auth_key

This is not the case with DAOS. an RPC (bulk handle) could be forwarded to other servers (and those servers will then pull data from the original client), but those other servers will still be accepting RPCs directly from clients as well. I don't believe we have a situation where a server will only respond to other servers but not service clients.

1 reply

iziemba Sep 12, 2023
Collaborator Author

Thanks for the clarification!

iziemba · 2023-09-12T02:24:17Z

iziemba
Sep 12, 2023
Collaborator Author

I think the bulk of this discussion has been around limiting using fi_addr_t's to define both EP addr + auth_key. As pointed out, going this approach leads to a natural extension for FI_DIRECT_RECV. Restricting receive buffers to specific auth_key's is a future use-case we want to support. One initial limitation with this approach is how to support FI_ADDR_UNSPEC (as defined today) while restricting the buffer to a specific auth_key. Thoughts on this?

3 replies

shefty Sep 12, 2023
Maintainer

I can think of 2 options, both of which are 'meh'. We have a macro to set fi_addr_t to ADDR_UNSPEC & auth_key index. E.g.

#define fi_addr_t fi_any_addr(int auth_index) (FI_ADDR_UNSPEC - (u64) auth_index)

Or we require inserting a wildcard address into the AV to generate any address.

The latter option might be useful as an optimization. If an EP is bound to an AV that does not have a wildcard address inserted into it, the EP could disable matching on any source. NCCL, for example, doesn't need that capability.

shefty Sep 12, 2023
Maintainer

Btw, the app doesn't necessarily need to insert the wildcard address. It could be specified at AV creation, There would still need to be some mechanism for the app to obtain an fi_addr_t value corresponding to the wildcard address & auth key. That could be through a separate AV function call, which would work better than the macro mentioned above.

iziemba Sep 12, 2023
Collaborator Author

I think the AV function call makes the most sense. In #9319, I propose that if you do a fi_av_insert_auth_key_addr() with the FI_AUTH_KEY_MATCH_ALL flag, the returned fi_addr_t will be for all EP addrs and the specific auth key.

iziemba · 2023-09-12T03:51:03Z

iziemba
Sep 12, 2023
Collaborator Author

Strawman API PR: #9319

0 replies

iziemba · 2023-09-12T19:26:05Z

iziemba
Sep 12, 2023
Collaborator Author

One of the first questions to answer is how an app determines if auth_key(s) are supported and/or required. Following the existing approach relies on some mechanism outside of libfabric. It's possible for the fi_getinfo() query to return a non-zero fi_domain_attr::auth_key_size to indicate support, though that behavior isn't defined by the man pages. It would also be possible to report an fi_domain_attr::auth_key_cnt to indicate if keys are supported and how many. auth_key_cnt could also be applied to the fi_ep_attr.

To close this hole completely and take into consideration this new API, we may need 4 new secondary capability flags.

FI_AUTH_KEY_DOM: Provider supports setting an auth key at the domain level. All subsequent EPs and MRs will inherit this auth key.

FI_AUTH_KEY_EP: Paired with FI_AUTH_KEY_DOM. Provider supports setting an auth key at the EP level. If EP auth key is NULL, domain auth key will be used. This capability is mutually exclusive with FI_AUTH_KEY_AV.

FI_AUTH_KEY_MR: Paired with FI_AUTH_KEY_DOM and optionally with FI_AUTH_KEY_AV. Provider supports setting an auth key at the MR level.

If FI_MR_ENDPOINT is set, the EP must be configured to support the auth key the MR is being associated with. For example, if the EP type is RDM, FI_AUTH_KEY_MR must be used paired with FI_AUTH_KEY_AV to enable EPs with multiple auth keys. If the MR auth key is NULL and FI_AUTH_KEY_AV is set, the MR will be enabled against all auth keys the EP support. If the MR auth key is NULL and FI_AUTH_KEY_AV is not set, the MR will be enabled against the default EP auth key.

If FI_MR_ENDPOINT is not set, the domain must support the auth key. Determining which auth keys the domain support is outside the scope of this document. If the MR auth key is NULL, the MR will be enabled against the default domain auth key.

FI_AUTH_KEY_AV: Provider supports inserting auth keys into the AV. Associating an AV with auth keys enables an RDM EP to be bound to multiple auth keys, select outgoing data transfer auth key, and restrict incoming data transfers (excluding RMA/AMO) to specific auth keys. FI_AUTH_KEY_AV can be combined with FI_AUTH_KEY_MR to restrict incoming RMA/AMO data transfers.

Initial thoughts?

1 reply

shefty Sep 12, 2023
Maintainer

4 flags = 16 permutations, even if not all are valid. We need mere humans able to figure out how to use this, and we're letting ultimate flexibility and hardware implementation drive the API here, rather than focusing on the requirements.

We need job isolation. The current solution can provide that. The next requirement is for an application to communicate with multiple jobs. That either requires allocating separate endpoints or specifying the key per transfer. The latter is driving the changes to link the auth key with the fi_addr_t.

AFAICT, the proposed changes to the AV to add auth key support supercedes previous solutions. Reporting how many auth keys may be associated with each endpoint seems sufficient. 0 = no support, 1 = job isolation only, >1 = can specify per transfer.

soumagne · 2023-09-12T21:26:53Z

soumagne
Sep 12, 2023

FI_NO_SOURCE_AUTH_KEY
Paired with FI_SOURCE_AUTH_KEY
Flag used per MR and recv/trecv to signal to provider that corresponding completion events do not need to return source authorization key data
Optimization to avoid potential reverse lookup to retrieve fi_auth_key_t

just wanted to circle back on that. Should that translate now to simply adding a FI_NO_SOURCE flag? I think it would still be useful to have (maybe that's a separate discussion?) since when we call fi_cq_readfrom() we can't know in advance the type of operation we're getting and whether or not we want to gather the source+auth_key. (for instance in our case, we want that info for RPC recv on the server but we want to ignore it for RPC response recv on the client)

2 replies

iziemba Sep 19, 2023
Collaborator Author

Does the client need to operate with FI_SOURCE?

soumagne Sep 19, 2023

oh right that's a good point, we could completely turn off that capability on the client. That just leaves the case of servers that send RPCs to each other (so listening on any client/server source but also expecting msgs to be received from specific server sources).

soumagne · 2023-09-13T19:02:18Z

soumagne
Sep 13, 2023

@shefty you stated previously:

If the remote endpoint is closed and a new endpoint is opened with the same address, then IMO it is reasonable that the AV entry for that endpoint be refreshed if the auth key changes. I do think forming the tuple at AV insert is easier to use, even if it does mix a protection key with the address. But we're mostly stuck with that solution if we want simple. Plus v2 proposes other changes where the fi_addr_t value may no longer strictly be considered an address.

@iziemba I see in your PR you described:

Calling fi_av_remove() with this fi_addr_t will
delete the authorization key. -FI_EBUSY will be returned from
fi_av_remove() should this key still be used by en EP. In other words,
all EPs using this authorization key need to be closed for
fi_av_remove() to succeed.

I think for garbage collection purposes it would make sense for us to be able to call AV remove on entries that are no longer being used over time since client processes may come and go and they may use different VNIs if they come from different jobs. However in that case the local server EP would remain the same, am I missing something or it sounds like based on your description that it would not be possible to garbage collect / remove past entries unless we'd call fi_close() on the server EP (which we do not do as we have always one EP / domain per server thread for the lifetime of the server).

1 reply

shefty Sep 13, 2023
Maintainer

The garbage collection still works the same. You would add/remove remote endpoints from the AV as normal. The local endpoint doesn't change and remains open. The fi_av_remove() call that @iziemba is referencing is related to the authorization key only. You can't remove an authorization key from the AV while it's being used to communication with a remote endpoint.

iziemba · 2023-10-03T17:18:32Z

iziemba
Oct 3, 2023
Collaborator Author

Libfabric AV Auth Keys.pdf
Slides from today's OFIWG meeting.

0 replies

Libfabric Authorization Key Ring Proposal #9204

iziemba Aug 8, 2023 Collaborator

Replies: 10 comments · 21 replies

iziemba Aug 8, 2023 Collaborator Author

shefty Aug 8, 2023 Maintainer

shefty Sep 11, 2023 Maintainer

iziemba Sep 11, 2023 Collaborator Author

iziemba Sep 11, 2023 Collaborator Author

shefty Sep 11, 2023 Maintainer

soumagne Sep 11, 2023

iziemba Sep 11, 2023 Collaborator Author

shefty Sep 11, 2023 Maintainer

iziemba Sep 11, 2023 Collaborator Author

soumagne Sep 11, 2023

frostedcmos Sep 11, 2023

iziemba Sep 12, 2023 Collaborator Author

iziemba Sep 12, 2023 Collaborator Author

shefty Sep 12, 2023 Maintainer

shefty Sep 12, 2023 Maintainer

iziemba Sep 12, 2023 Collaborator Author

iziemba Sep 12, 2023 Collaborator Author

iziemba Sep 12, 2023 Collaborator Author

shefty Sep 12, 2023 Maintainer

soumagne Sep 12, 2023

iziemba Sep 19, 2023 Collaborator Author

soumagne Sep 19, 2023

soumagne Sep 13, 2023

shefty Sep 13, 2023 Maintainer

iziemba Oct 3, 2023 Collaborator Author

iziemba
Aug 8, 2023
Collaborator

Replies: 10 comments 21 replies

iziemba
Aug 8, 2023
Collaborator Author

shefty
Aug 8, 2023
Maintainer

shefty Sep 11, 2023
Maintainer

iziemba Sep 11, 2023
Collaborator Author

iziemba Sep 11, 2023
Collaborator Author

shefty Sep 11, 2023
Maintainer

iziemba
Sep 11, 2023
Collaborator Author

shefty Sep 11, 2023
Maintainer

iziemba Sep 11, 2023
Collaborator Author

frostedcmos
Sep 11, 2023

iziemba Sep 12, 2023
Collaborator Author

iziemba
Sep 12, 2023
Collaborator Author

shefty Sep 12, 2023
Maintainer

shefty Sep 12, 2023
Maintainer

iziemba Sep 12, 2023
Collaborator Author

iziemba
Sep 12, 2023
Collaborator Author

iziemba
Sep 12, 2023
Collaborator Author

shefty Sep 12, 2023
Maintainer

soumagne
Sep 12, 2023

iziemba Sep 19, 2023
Collaborator Author

soumagne
Sep 13, 2023

shefty Sep 13, 2023
Maintainer

iziemba
Oct 3, 2023
Collaborator Author