-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug fix: service will never be deregisterd if the status is critical … #101
base: main
Are you sure you want to change the base?
bug fix: service will never be deregisterd if the status is critical … #101
Conversation
…when esm startup.
failureCounter and successCounter should count when HealthCritical or HealthPassing occured continuously.
// Do nothing if update is idempotent | ||
if check.Status == status && check.Output == output { | ||
check.failureCounter = decrementCounter(check.failureCounter) | ||
check.successCounter = decrementCounter(check.successCounter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the only 2 uses of decrementCounter(). Should probably nuke it as well.
@@ -319,6 +314,10 @@ func (c *CheckRunner) UpdateCheck(checkID types.CheckID, status, output string) | |||
delete(c.checksCritical, checkID) | |||
} | |||
|
|||
// Do nothing if update is idempotent | |||
if check.Status == status && check.Output == output { | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure I'm understanding things... moving this return is the 'fix'. So it doesn't skip things between the old return and this one? Wanted to be sure as much of this just seems like a refactor/cleanup but it is fixing something and how this fixes it wasn't called out anywhere (or I missed it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1, the bug is , if check.Status is cirtical at first(which esm got from consul-agent), and esm checked it again and found it's cirtical, then this process will return. But in fact we need to do something with it.
2, I had checked that this bug is since 0350cc6. I just revert the code to before commit 0350cc6.
@varnson thanks for creating this PR. Would you be able to elaborate more on the steps you used to reproduce this bug? I was not able to reproduce it. Here is what I did:
Am I misunderstanding the issue or do you different steps/examples to reproduce this? |
@lornasong Sorry for My comment is not clear enough. At my case, I register a service, start esm, esm update the status and output to consul-agent, and I restart the esm. |
@varnson I was able to reproduce your issue by setting the I understand what you're saying: The status is critical but the latest status and output are the same as what is stored in Consul. Then the check cannot reach I noticed you changed |
@lornasong About anti-flapping feature, I hadn't read the consul source very carefully, but I found the source like this: |
@varnson, yes, you're correct that the consul-esm anti-flapping behavior is different from consul. This is actually intentional. You can read about the decision to implement a new and different behavior: #78 (comment). You can also read about the benefits of this different behavior: #78 (comment). The documentation of consul-esm's anti-flapping behavior https://github.com/hashicorp/consul-esm#threshold-for-updating-check-status should be accurate for the different implementation. If this is not the case, please feel free to open another issue for it! Let me know if you have any questions. Thanks. |
@lornasong I have understand what you means, but it seems difficult to fix it. |
Step::
1, A service is in critical status.
2,start up the esm.
3, the critical will never be deregistered.
This bug is since 0350cc6