-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse Rebuild IO handles #1755
Reuse Rebuild IO handles #1755
Conversation
tiagolobocastro
commented
Oct 11, 2024
When the rebuild has been complete, if we wait for it this fails because the channels are not longer available. Instead, simply return the rebuild state, since this is what we want anyway. Signed-off-by: Tiago Castro <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
As I understand we'll now make all 16 rebuild tasks use the same connection for reading/writing the rebuild segments. Is the rebuild descriptor destroyed today as soon as rebuild finishes and the nexus channels are reconfigured to add rebuilt bdev's handle? We wouldn't want a lingering handle in rebuild descriptor while the rebuilt child bdev handle has been added to main nexus channels too.
Yes, each task has the immutable
Yes that's right, the rebuild backend "runs away" when the rebuild is started. When the rebuild completes the backend terminates, and on the For now simply moving the handle to descriptor is good enough for solving the connection issues. |
Reuses the rebuild IO handles, rather than attempting to allocate them per rebuild task. The main issue with handle allocation on the fly is that the target may have not cleaned up a previous IO qpair connection, and so the connect may fail. We started seeing this more on CI because we forgot to cherry-pick a commit increasing the retry delay. However, after inspecting a bunch of user support bundles I see that we still have occasional connect errors. Rather than increasing the timeout, we attempt here to reuse the handles, thus avoid the problem almost entirely. Signed-off-by: Tiago Castro <[email protected]>
Brings in latest fixes and improvements. Signed-off-by: Tiago Castro <[email protected]>
bba85fb
to
b59bc00
Compare
bors merge |
Build succeeded: |