-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The bug in folly/fbthrift interaction with libevent 2.*. #490
Comments
Hi! I just tried to reproduce it on our internal build. Specifically what I did was that I completely removed all the MemoryIdler code from IOThreadPoolExecutor (to make sure that there are no spurious EventBase wakeups triggered by it) and then ran all the Thrift unit tests - observing no failures. I suspect that there may be some problem in EventBase compatibility with your libevent version. Could you please try the same experiment with unit tests + disabled MemoryIdler and let me know if you are seeing deadlocks/failures? The internal/external events only matter wrt .loop() vs .loopForever(). loop() keeps looping the EventBase only while there are any registered external events, loopForever() keeps looping the EventBase until terminateLoopSoon() is called. IOThreadPoolExecutor is using loopForever() (https://github.com/facebook/folly/blob/a25c6a2cfc675c4e62621cfa5756ce429b38cd85/folly/executors/IOThreadPoolExecutor.cpp#L250) so internal/external shouldn't make any difference. |
Thanks for the quick reply!
Most of my description is a theoretic description of the problem with the links on the current
It is not easy. I think that with high probability
It can be true only for the queue managed by
Yeap, this part of a contract is not violated. But there are two different [interleaving] "looping" in the [e.g.] let me draw your attention to this part of my previous comment.
internality gives you not only earlier exit in the case of "no external events", but also make executed callback on INTERNAL event be ignored for exit via |
I tried running your repro (the only thing I changed was using a different service interface) and it still doesn't fail on our build. The key difference is that we use libevent 1.4.14b internally.
We have a number of unit tests that start a thrift server (via ScopedServerInterfaceThread), so I was suspecting that those would catch the problem.
Thanks for flagging this! I re-checked the EventBase logic and it indeed has special handling for the internal event mode of its own queue (https://github.com/facebook/folly/blob/main/folly/io/async/EventBase.cpp#L436-L438). That said, in this particular case IOThreadPoolExecutor is using loopForever(), so EventBase's own queue will be registered as an external event and so event_base_loop should not return early. |
This is very important note! In this version internal callbacks are not ignored for Did I get it right that P.S. libevent version is so important for this discussion, so I add the version to the title. |
Hello,
fbthrift
team!There is serious bug in using INTERNAL
libevent
flag infolly:EventBase
andfbthrift:IOWorkerContext
.Simple reproducer
Simplest "return constant version" server responds with delay (correlated with
folly::detail::MemoryIdler::defaultIdleTimeout
).Code:
Execution:
This server [while being idled] is responding slowly delayed with half-normal distribution near
MemoryIdler::defaultTimeout
.The reason of this behavior.
When server is waiting on event base loop, it can't wake up on
putMessageInReplyQueue
, because the reply queue was registered here in internal mode. Internal mode for this queue means that we can't go out from loop via EVLOOP_ONCE by event (and executed callback) on this queue, because n is zero. And the only event (non-internal), which can wake the event loop up, in my example, is timer event forMemoryIdler
.Thoughts
event_base_loopbreak
frommessageAvailable
was removed in commit. But the following removed comment is still true.startConsumingInternal
[kind of] violates the contract of the function described here. This queue is managed outside theEventBase
. For instance, it prevents switch INTERNAL mode for this queue inside EventBase here forkeepAlive
mode.folly
is fast exit path by "no event registered". But INTERNAL also affects another exit-path, whichfolly
seems to ignore now. I think that if second exit-path will work as if "there is no INTERNAL events" then this problem will gone (see "Possible solution").libevent
callbacks will be executed, butevent_base_loop
does not have to go out (and does not have to transfer control tofolly::EventBase
callbacks).Possible solution
Actually,
event_base_loopbreak
-ish from first of "Thoughts" seems needed here, but maybe with other implementation. Hard breaking withevent_base_loopbreak
go out fromlibevent
loop almost immediately, and don't give a chance to execute otherlibevent
callbacks (big amounts of switch (libevent
callbacks vsfolly::EventBase
callbacks) => worse performance).For example, this problem will gone if somebody guarantees that after every group of
libevent
callbacks on INTERNAL events in event loop iteration any [non-internal]libevent
callback will be register to call (it can be zero-timer (which register need-to-execute callback immediately)). This idea removes the problem of second exit-path, because every time when internal callback is executed (some "stoptoken callback" will be generated) and we exitevent_base_loop
at most on the next iteration.Specific implementation of it can be the following. Call new
EventBase::onInternalCallbackEnd
from the end of libeventCallback in the case of INTERNAL. In this newEventBase::onInternalCallbackEnd
register zero-timer event (do not register if it was already registered on this iteration of loop).The text was updated successfully, but these errors were encountered: