-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/PROTO: Fix RNDV_SCHEME logic #10230
base: master
Are you sure you want to change the base?
UCP/PROTO: Fix RNDV_SCHEME logic #10230
Conversation
I think it worth covering this logic by tests. If we merge #9989 it would allow to cover it by the following way: UCS_TEST_P(test_ucp_proto_mock, rndv_thresh, "RNDV_THRESH=1024", "NET_DEVICES=mlx5_0:1")
{
setup_mock("ib").set_mock_iface_attr("rc_mlx5/mlx5_0:1",
[](uct_iface_attr_t &iface_attr) {
iface_attr.dev_num_paths = 1;
iface_attr.cap.am.max_short = 208;
iface_attr.bandwidth.shared = 10000000000;
iface_attr.latency.c = 0.000006;
iface_attr.latency.m = 0.000000001;
});
connect();
ucp_proto_select_key_t key = any_key();
key.param.op_id_flags = UCP_OP_ID_AM_SEND;
key.param.op_attr = 0;
check_ep_config(sender(), {
{"0", "200", "short", "rc_mlx5/mlx5_0:1"},
{"201", "1023", "copy-in", "rc_mlx5/mlx5_0:1"},
{"1024", "91894", "rendezvous fragmented copy-in copy-out",
"rc_mlx5/mlx5_0:1"},
{"91895", "inf", "rendezvous zero-copy read from remote",
"rc_mlx5/mlx5_0:1"},
}, key);
}
UCS_TEST_P(test_ucp_proto_mock, rndv_thresh_and_scheme, "RNDV_THRESH=1024", "RNDV_SCHEME=get_zcopy", "NET_DEVICES=mlx5_0:1")
{
setup_mock("ib").set_mock_iface_attr("rc_mlx5/mlx5_0:1",
[](uct_iface_attr_t &iface_attr) {
iface_attr.dev_num_paths = 1;
iface_attr.cap.am.max_short = 208;
iface_attr.bandwidth.shared = 10000000000;
iface_attr.latency.c = 0.000006;
iface_attr.latency.m = 0.000000001;
});
connect();
ucp_proto_select_key_t key = any_key();
key.param.op_id_flags = UCP_OP_ID_AM_SEND;
key.param.op_attr = 0;
check_ep_config(sender(), {
{"0", "200", "short", "rc_mlx5/mlx5_0:1"},
{"201", "1023", "copy-in", "rc_mlx5/mlx5_0:1"},
{"1024", "inf", "rendezvous zero-copy read from remote",
"rc_mlx5/mlx5_0:1"},
}, key);
}
|
src/ucp/rndv/proto_rndv.inl
Outdated
return UCS_MEMUNITS_INF; /* used only as last resort */ | ||
if ((context->config.ext.rndv_mode == UCP_RNDV_MODE_AUTO) || | ||
(rndv_modes & UCS_BIT(context->config.ext.rndv_mode))) { | ||
return UCS_MEMUNITS_AUTO; /* used only as last resort */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment is wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/ucp/rndv/proto_rndv.c
Outdated
params->super.cfg_priority, remote_proto->cfg_priority); | ||
if (remote_proto->cfg_thresh != UCS_MEMUNITS_AUTO) { | ||
/* If RNDV_SCHEME is set, all protocols except forced one reports INF */ | ||
ucs_assert(remote_proto->cfg_thresh == UCS_MEMUNITS_INF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/ucp/rndv/proto_rndv.c
Outdated
params->super.cfg_priority, remote_proto->cfg_priority); | ||
if (remote_proto->cfg_thresh != UCS_MEMUNITS_AUTO) { | ||
/* If RNDV_SCHEME is set, all protocols except forced one reports INF */ | ||
ucs_assert(remote_proto->cfg_thresh == UCS_MEMUNITS_INF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/azp run UCX PR |
Azure Pipelines successfully started running 1 pipeline(s). |
Drawback of the proposed logic is that if defined So if we set |
{ | ||
if (!sender().is_rndv_supported()) { | ||
UCS_TEST_MESSAGE << "RNDV is not supported"; | ||
m_check_recv_rndv_flags = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? what if this is AM-based rndv?
also if rndv is not supported this test suite should be skipped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe message confused you. That case is created to handler AM-based RNDV case. AFAIR AM RNDV doesn't set recv_attr
flags.
I can change message here to "RMA is not supported" would it be better from your point of view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what flag do you mean? AM-based RNDV is using generic flow RTS->RTR->data chunks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, you are right. The thing here is that due to new logic if RNDV_SCHEME=get_zcopy
and rndv/get/zcopy is unavailable eager can be selected instead of RNDV.
So we cannot rely on fact that UCP_AM_RECV_ATTR_FLAG_RNDV
and UCP_AM_RECV_ATTR_FLAG_DATA
will be set in am_data_handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can skip these tests if RNDV not supported but I don't think that it is important
What
Change
RNDV_SCHEME
andRNDV_THRESH
logic to work properly.Why ?
Currently if user set certain
RNDV_SCHEME
without settingRNDV_THRESH
this scheme would be forced on all sizes sincecfg_thresh
for this protocol would be0
. This change fixes that.How ?
With this change remote variant can report only 2 values as
cfg_thresh
:AUTO
andINF
.AUTO
reported for:RNDV_SCHEME=auto
RNDV_SCHEME
INF
is reported for all protocols that are not set inRNDV_SCHEME
ifRNDV_SCHEME
is set to some certain protocol.Manual testing cases: