-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/PROTO/MULTI: Allow header be greater than max_frag #10263
base: master
Are you sure you want to change the base?
UCP/PROTO/MULTI: Allow header be greater than max_frag #10263
Conversation
5b5b1e3
to
ce3f490
Compare
/azp run |
/azp run UCX PR |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo, it seems 2 separate PRs are worth creating, because you are addressing two separate issues.
src/ucp/proto/proto_multi.c
Outdated
@@ -59,8 +73,7 @@ ucs_status_t ucp_proto_multi_init(const ucp_proto_multi_init_params_t *params, | |||
¶ms->super, params->first.lane_type, params->first.tl_cap_flags, | |||
1, 0, lanes); | |||
if (num_lanes == 0) { | |||
ucs_trace("no lanes for %s", | |||
ucp_proto_id_field(params->super.super.proto_id, name)); | |||
ucs_trace("no lanes for %s", perf_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not the same I'd keep proto_name here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/ucp/proto/proto_multi.c
Outdated
/* If type of the first lane is not the same as in the middle, then | ||
* exclude this lane from sorting */ | ||
lane = (params->first.lane_type == params->middle.lane_type) ? 0 : 1; | ||
/* Sort lanes array by lanes_perf bandwidth */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is already similar comment on line 101
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Ok, will split them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this for v1.18.x?
lpriv->max_frag, hdr_size); | ||
/* If header is greater than max_frag - that's ok, we allow it, but we send | ||
* only the header data and keep the payload empty */ | ||
if (lpriv->max_frag <= hdr_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe smth like
ssize_t max_frag;
...
max_frag = lpriv->max_frag - hdr_size;
if (max_frag < 0) {
return 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently I'm reverting this change, because signed type ssize_t
has only half width of size_t
and therefore cannot be reliably used to hold the difference between two size_t
variables. The failure is exposed when running test_ucp_rma.put_blocking
tests, where lpriv->max_frag
is almost equal to SIZE_MAX
and hdr_size=0
. In that case the difference = SIZE_MAX
, but it becomes negative when casted to ssize_t
that leads to test failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR lpriv->max_frag
can be set to lane_perf->max_frag
which is a transport limitation, so may there be a situation when some lanes don't really support such big header size so we cannot send even header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see in ucp_worker_set_max_am_header
the worker->max_am_header
is set so that header can fit in any lane. If user sends something larger than that, then we reject this input:
ucp_am_send_nbx_check_header_length(ucp_worker_h worker, size_t header_length)
{
if (ENABLE_PARAMS_CHECK && (header_length > worker->max_am_header)) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case, better not calculate max_frag = lpriv->max_frag - hdr_size
during variable declaration, but do it after checking (lpriv->max_frag <= hdr_size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@yosefe I'm not sure if we need to backport this to 1.18, because this issue happens only with MAX_EAGER_LANES > 1, which is not default config |
can you laso remove UD restrictions from |
imo, no need to backport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, if no issues are found with re-enabled tests
src/ucp/proto/proto_multi.inl
Outdated
/* If header is greater than max_frag - that's ok, we allow it, but we send | ||
* only the header data and 1 byte of payload. | ||
* 1 byte is needed to distinguish between first and middle fragments that | ||
* we do by comparing req->send.state.dt_iter.offset with 0. | ||
* @see ucp_worker_set_max_am_header | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we decided that no need to backport this fix to v1.18.x, how hard it will be to fix according to
* TODO: fix generic AM based multi-fragment protocols, so that this
* trick is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/azp run UCX PR |
Azure Pipelines successfully started running 1 pipeline(s). |
max_uct_fragment = ucs_max(if_attr->cap.am.max_bcopy, | ||
max_ucp_header - 1) - | ||
max_ucp_header - 1; | ||
max_ucp_header) - | ||
max_ucp_header; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, seems just a little more readable for me:
max_uct_fragment = ucs_max(if_attr->cap.am.max_bcopy -
max_ucp_header, 0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more readable, but not always correct. What if max_ucp_header > if_attr->cap.am.max_bcopy?
We have to be careful when deducting one size_t
from the other, because we can easily get a "negative" result which translates into some huge number. I guess it exists in this form for exactly this reason - to always guarantee the correctness
src/ucp/dt/datatype_iter.c
Outdated
@@ -265,11 +265,6 @@ size_t ucp_datatype_iter_iov_next_iov(const ucp_datatype_iter_t *dt_iter, | |||
ucs_assertv(length <= max_length, "length=%zu max_length=%zu", length, | |||
max_length); | |||
|
|||
/* Check that if not reached the end, packed data is not empty */ | |||
ucs_assertv((dt_iter->offset == dt_iter->length) || (length > 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that true that all middle fragments still should always send some payload? Maybe somehow save this check for middle segments only (e.g. check offset)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's true
To have this assert I would need to propagate request (or at least request flags/boolean) to this function and to all the functions calling it (ucp_datatype_iter_next_iov
refs in 7 other files). If it's really important to have this check, we can do that.
@yosefe - since you added this assertion, do we need to keep it even at the cost of adding one more param to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it is necessary to add request argument here. We can pass only information that is needed and not to this function but to ucp_datatype_iter_next_iov
. Also maybe we can do that even without passing request at all by relaxing this assertion (E.g. check that only if offset > 0) WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I add relaxed check, still maybe useful
lpriv->max_frag, hdr_size); | ||
/* If header is greater than max_frag - that's ok, we allow it, but we send | ||
* only the header data and keep the payload empty */ | ||
if (lpriv->max_frag <= hdr_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR lpriv->max_frag
can be set to lane_perf->max_frag
which is a transport limitation, so may there be a situation when some lanes don't really support such big header size so we cannot send even header?
test/gtest/ucp/test_ucp_am.cc
Outdated
@@ -1584,9 +1584,8 @@ class test_ucp_am_nbx_dts : public test_ucp_am_nbx_reply { | |||
|
|||
/* Skip tests for ud_v and ud_x because of unstable reproducible failures during |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks
src/ucp/dt/datatype_iter.c
Outdated
@@ -266,7 +266,8 @@ size_t ucp_datatype_iter_iov_next_iov(const ucp_datatype_iter_t *dt_iter, | |||
max_length); | |||
|
|||
/* Check that if not reached the end, packed data is not empty */ | |||
ucs_assertv((dt_iter->offset == dt_iter->length) || (length > 0), | |||
ucs_assertv((dt_iter->offset == 0) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you pls update the comment above and mention new check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why needed to update the assert?
maybe we should skip this function altogether if there is no room for data, or exit this function early if max_length==0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
We can't easily skip this function, because it updates the next iteration state
For now I just exit from the function earlier
test/gtest/ucp/test_ucp_request.cc
Outdated
@@ -452,7 +452,8 @@ class test_proto_reset : public ucp_test { | |||
{ | |||
const ucp_datatype_iter_t *dt_iter = &req->send.state.dt_iter; | |||
|
|||
return !ucp_datatype_iter_is_begin(dt_iter) && | |||
return ((req->flags & UCP_REQUEST_FLAG_MIDDLE_FRAGMENT) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ucp_proto_multi_is_middle_fragment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks logical, it was my initial impl, but then I removed it, because including "proto_multi.inl" into C++ translation unit requires quite some unrelated changes in that file: see my reverted commit which fixes some of them: e74c74a
So I thought it's not worth it.
Else we can move ucp_proto_multi_is_middle_fragment
function to proto_multi.h
src/ucp/am/eager_multi.c
Outdated
@@ -78,6 +78,7 @@ static size_t ucp_am_eager_multi_bcopy_pack_args_first(void *dest, void *arg) | |||
|
|||
ucs_assertv(req->send.state.dt_iter.offset == 0, "offset %zu", | |||
req->send.state.dt_iter.offset); | |||
ucs_assert(ucp_proto_multi_is_first_fragment(req)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need the assertion at line 79?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is complementary check, which is still eligible IMO
src/ucp/core/ucp_request.c
Outdated
@@ -44,6 +44,7 @@ static const char *ucp_request_flag_names[] = { | |||
[ucs_ilog2(UCP_REQUEST_FLAG_RECV_TAG)] = "rcv_tag", | |||
[ucs_ilog2(UCP_REQUEST_FLAG_RKEY_INUSE)] = "rk_use", | |||
[ucs_ilog2(UCP_REQUEST_FLAG_USER_HEADER_COPIED)] = "hdr_copy", | |||
[ucs_ilog2(UCP_REQUEST_FLAG_MIDDLE_FRAGMENT)] = "mid_frag", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO a global request flag is misleading because it's only for AM send protocol.
Maybe add something in req->send.msg_proto.am?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/ucp/am/eager_multi.c
Outdated
@@ -254,7 +254,7 @@ static UCS_F_ALWAYS_INLINE ucs_status_t ucp_am_eager_multi_zcopy_send_func( | |||
UCS_STATIC_ASSERT(sizeof(hdr.first) == sizeof(ucp_am_hdr_t)); | |||
UCS_STATIC_ASSERT(sizeof(hdr.middle) == sizeof(ucp_am_hdr_t)); | |||
|
|||
if (req->send.state.dt_iter.offset == 0) { | |||
if (ucp_proto_multi_is_first_fragment(req)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems this function should be active-message specific (as we see, it's not used in other mult-frag protocols such as tag/eager). so move it to proto_am.inl and rename to ucp_proto_am_is_first_fragment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/ucp/dt/datatype_iter.c
Outdated
@@ -266,7 +266,8 @@ size_t ucp_datatype_iter_iov_next_iov(const ucp_datatype_iter_t *dt_iter, | |||
max_length); | |||
|
|||
/* Check that if not reached the end, packed data is not empty */ | |||
ucs_assertv((dt_iter->offset == dt_iter->length) || (length > 0), | |||
ucs_assertv((dt_iter->offset == 0) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why needed to update the assert?
maybe we should skip this function altogether if there is no room for data, or exit this function early if max_length==0?
src/ucp/proto/proto_common.c
Outdated
@@ -806,7 +807,8 @@ void ucp_proto_request_restart(ucp_request_t *req) | |||
} | |||
|
|||
/* Select a protocol with resume request support */ | |||
if (!ucp_datatype_iter_is_begin(&req->send.state.dt_iter)) { | |||
if (ucp_proto_multi_is_middle_fragment(req) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is active-message specific, so maybe the protocol "reset" callback should provide the indication of whether a "RESUME" flags is going to be required or not, instead of just deciding it according to ucp_datatype_iter_is_begin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reduced the scope of the changes to AM eager multi, and I don't see test failures in this function
src/ucp/proto/proto_multi.inl
Outdated
ucp_proto_multi_is_middle_fragment(const ucp_request_t *req) | ||
{ | ||
return req->flags & UCP_REQUEST_FLAG_MIDDLE_FRAGMENT; | ||
} | ||
|
||
static UCS_F_ALWAYS_INLINE int | ||
ucp_proto_multi_is_first_fragment(const ucp_request_t *req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need both wrappers? just for the inverse check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, removing the middle one
lpriv->max_frag, hdr_size); | ||
/* If header is greater than max_frag - that's ok, we allow it, but we send | ||
* only the header data and keep the payload empty */ | ||
if (lpriv->max_frag <= hdr_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case, better not calculate max_frag = lpriv->max_frag - hdr_size
during variable declaration, but do it after checking (lpriv->max_frag <= hdr_size)
src/ucp/core/ucp_request.h
Outdated
/** | ||
* AM specific internal request flags | ||
*/ | ||
enum ucp_request_am_impl_flags { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe define it in proto_am.inl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
src/ucp/proto/proto_am.inl
Outdated
* AM specific internal request flags | ||
*/ | ||
enum ucp_request_am_impl_flags { | ||
UCP_REQUEST_AM_FLAG_MIDDLE_FRAGMENT = UCS_BIT(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor:
UCP_REQUEST_AM_FLAG_MIDDLE_FRAGMENT = UCS_BIT(0) | |
UCP_REQUEST_FLAG_AM_MIDDLE_FRAGMENT = UCS_BIT(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
What
This is a fix for RM#4134043, which is a regression introduced in #9920
This issue happens when MAX_EAGER_LANES > 1
Why ?
In multi-fragment protocols we set
max_frag
per lane proportionally to it's bandwidth. However there is a single AM lane that could getmax_frag
assigned less than a maximum header size. This provokes an assertionHow ?
max_frag
max_frag
- that's ok, we allow it, but we send only the header data and keep the payload empty