Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancellation fixes and tests. #445

Merged
merged 21 commits into from
Sep 15, 2023
Merged

Cancellation fixes and tests. #445

merged 21 commits into from
Sep 15, 2023

Conversation

cheatfate
Copy link
Collaborator

No description provided.

@cheatfate cheatfate marked this pull request as draft August 18, 2023 03:44
@cheatfate cheatfate changed the title Add callTick and stream cancellation tests. Cancellation fixes and tests. Aug 25, 2023
@cheatfate cheatfate marked this pull request as ready for review August 25, 2023 10:23
## done e.g. changes its state (become completed, failed or cancelled).
##
## If ``fut`` is already finished (completed, failed or cancelled) result
## Future[void] object will be returned complete.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void?

Copy link
Collaborator Author

@cheatfate cheatfate Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cancelAndWait() returns Future[void] which will become complete when future: FutureBase become finished.


proc checkedCancelAndWait*(fut: FutureBase): Future[bool] =
## Initiate cancellation process for Future ``fut`` and wait until ``fut`` is
## done e.g. changes its state (become completed, failed or cancelled).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## done e.g. changes its state (become completed, failed or cancelled).
## finished e.g. changes its state (become completed, failed or cancelled).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially it was supposed that cancel() should return boolean flag which indicates:
true if cancellation was successful - e.g. last Future in sequence was moved from Pending to Cancelled, all future callbacks got scheduled.
false if cancellation was unsuccessfull, e.g. we encounter any Future in sequence which are not Pending. It means that this sequence of Futures are moving towards finish and we can't cancel it.

But somehow this flag was discarded and not used, so i expect there present a lot of source code which do not respect boolean result and do not expect cancel(): boolean. So checkCancelAndWait it is same as cancelAndWait but it returns flag which indicates is future was actually cancelled or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the boolean mean though? It feels like it's too weak to be useful - an async operation is typically made up of many sub-operations and when we start cancelling things, some of these may have completed and some may be cancelled - I'm not sure a boolean alone can capture this complexity in a useful way and this might be why everyone discards it.

Basically, there are three options: "it was cancelled", "it finished" and "don't know/partial result" and the boolean doesn't represent this well enough to be useful practically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancellation could be compared to "message delivery", so this boolean flag indicates whether message was delivered to the bottom layer of target Future or it was not delivered.

what does the boolean mean though? It feels like it's too weak to be useful - an async operation is typically made up of many sub-operations and when we start cancelling things, some of these may have completed and some may be cancelled - I'm not sure a boolean alone can capture this complexity in a useful way and this might be why everyone discards it.

chronos tracks dependent Futures and cancellation using this tracking mechanism to deliver cancellation message to the bottom layer. So if bottom layer or any other Future before bottom layer will have status != Pending it will stop at this moment and returns false. But if the bottom layer Future has status == Pending, it will become Cancelled. And in this case cancel() will return true.

chronos/asyncfutures2.nim Outdated Show resolved Hide resolved
res = fut.checkedCancel()
retFuture

proc noCancelWait*[T](future: Future[T]): Future[T] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - when should this be used? we have 4 cancellation-related functions now and the precise conditions where one should be used over the other is .. confusing (checkedCancelAndWait, cancelAndWait, cancel and noCancelWait)

if this function does what I think it does, await withoutCancellation(...) is probably a better name - this function does no "waiting" on its own

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This procedure will create a Future[T] object which will have OwnCancelSchedule and it acts like a proxy for any Future[T]. In this case cancellation mechanism unable to go through this proxy Future[T] (it will not see any children behind it) and because it only see proxy Future[T] it is unable to schedule cancellation callbacks because of OwnCancelSchedule flag.

This procedure should be used only for closing procedures which are usually used to free resources. This procedures should not be cancelled to avoid resource leaks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This procedure should be used only for closing procedures which are usually used to free resources. This procedures should not be cancelled to avoid resource leaks.

this needs to be documented clearly, something like "This function introduces a cancellation barrier that prevents cancellations from propagating to the given future thus ensuring that even during cancellation, the work of future is finished.

Disabling cancellation is useful during resource cleanup to avoid leaks, but may also lead to delays or hangs when future takes long to finish, thus breaking expectations of downstream code on cancellation being fast."

We'll have to be pretty careful about this - ie if a cancellation suddenly takes several second, for example to complete a "proper" ssl shutdown, this will certainly mess up any code that waits for the cancellation to finish - imagine a timeout on sending an attestation via rest: if we block everything until close is done on one of N connections, this can seriously mess up progress on the next attestation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancellation process is always fast as lightning, it just follows recursively through Future's linked list, searching for the bottom Future. So lets call cancel() - cancellation, because cancelAndWait is not a cancellation process. At the end of cancelAndWait() you could get Future at any state except Pending. Because some Future[T] code could catch CancelledError and convert this signal to either error or value.

When you talking about freezes or blocks you need to understand that OS could block you, or your closeWait procedure could wait for event/signal/mutex and become blocking. Nothing we can do at this moment.

What we trying to do with this primitive is to avoid cancellation which could come while we already have "result" but perform resources cleanup in defer or finally branch.

But of course people could exploit this procedure usage and start using it everywhere. In this case such loops, will never be interrupted.

  while true:
    await noCancelWait sleepAsync(1.seconds)
    echo "Slot start"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case such loops, will never be interrupted.

yes - this is what I mean when I say it might hang: the calling code wants to stop ASAP but the function itself blocks cancellation.

It applies to await closeWait in particular - if closeWait, like in ssl, sends some shutdown marker to close the stream gracefully, await closeWait might take long if connectivity is bad - if we force waiting for closeWait we get hangs and this is risky.

Perhaps another way to avoid the closeWait leak in particular is to allow cancellation to propagate but change its handling: for example, if closeWait receives a cancellation request, it would release the resource it's holding but do so "abortively" without waiting further - this would work well for ssl and the like at least - a database connection could do the same, ie optimistically send a "rollback" command and close the connection without waiting for a response. In this sense, closeWait means "try to close gracefully" and cancelling closeWait means "close forcefully" - this strategy also avoids a leak.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closeWait could catch CancelledError and perform any type of cancellation behavior it wants.

And about blocking calls in async, we do cooperative multitasking and we do not have thread pool execution so yes - we have a flaw and anybody could run blocking function and do something bad to every other actors in async loop.

I do not care and nobody actually should care about performance of cancelAndWait. If you need performance and do not care about result, you can use cancel(). If you want to avoid some re-circulation issues you can do cancelAndWait. But you should not expected that cancelAndWait will be fast or slow.


future.internalMustCancel = true
return true
if FutureFlag.OwnCancelSchedule notin future.internalFlags:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a flag, could the cancelcb return a boolean?

The code is a bit confusing because both OwnCancelSchedule and an empty cancelcb is set for this to work - we could perhaps avoid the state and associated complexity by making this a local decision in the cb.

we can emulate the old behavior in the cancelcb= function by wrapping it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cancelcb could not return a flag, because it will introduce a lot of breaking changes for all projects which has Future[T] closure procedures with cancellation callbacks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

breaking changes

as mentioned, we can maybe work around this with a wrapper, ie

proc cancelCb(fut: auto, cb: OldCbType) =
  fut.internalcancelcb = proc(): bool = 
    cb()
    true

as is, it's .. difficult to write correct callbacks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would also go in the 4.0 branch so it's a good opportunity

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like and i will not agree on any changes with callbacks, having multiple callback types just complicate problem solving. So now i prefer Future flags in here, because it

  1. Extensible, i can always add/remove flags at any moment without breaking old code. While introducing new callbacks requires deprecation path and so on. More dirty solution as for me.
  2. Almost not visible for end users of library.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be hard to explain why this result is needed and what it achieves. And probably we will see a lot of ugly results in the code.

Address review comments.
@cheatfate cheatfate merged commit 2e8551b into master Sep 15, 2023
12 checks passed
@cheatfate cheatfate deleted the calltick branch September 15, 2023 16:38
## The procedure then has a chance to clean up or even deny the request
## using `try/except/finally`.
## NOTE: This procedure does not guarantee that cancellation will actually
## happened.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants