Improve the wait-for-apiserver ready check #713
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With the introduction of Karpenter in #585 we changed the order of steps the CLM does. Before it was:
With karpenter we changed it to:
Step 2 regularly fails during cluster creation in e2e with the error:
I believe this is a symptom of the APIserver not being quite ready yet and because we do the apply step right after the apiserver has had a non 500 response it fails whenever it's not fully ready. Before we didn't see it because after checking apiserver availability we waited another 5-10 min. during worker node stack creation before doing the apply.
This PR aims to fix the issue by not just checking the availability of the apiserver, but ensuring that it responds 200 on the
/readyz
endpoint. The logic is that if/readyz
is returning 200, then it must be fully ready and the apply calls should not fail like they sometimes do right now.Since this only happens sometimes it's hard to prove that this works 100%, but at least I have tested that it works as expected in terms of detecting when the apiserver is available.