-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install/Uninstall ccruntime in a loop fails #340
Comments
I reproduced it again (under a heavy load on 3rd iteration, the test phase took the usual time, though). I tried re-executing the # kubectl delete -k .
Error from server (NotFound): error when deleting ".": ccruntimes.confidentialcontainers.org "ccruntime-sample" not found While all pods were still there. I had extra debug outputs so the situation was:
ccruntime: Starting the cleanupu
NAME READY STATUS RESTARTS AGE
pod/cc-operator-controller-manager-ccbbcfdf7-v54gc 2/2 Running 0 42s
pod/cc-operator-daemon-install-xvqcq 1/1 Running 0 28s
pod/cc-operator-pre-install-daemon-c4m9d 1/1 Running 0 31s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cc-operator-controller-manager-metrics-service ClusterIP 10.107.199.43 <none> 8443/TCP 42s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/cc-operator-daemon-install 1 1 1 1 1 node.kubernetes.io/worker= 28s
daemonset.apps/cc-operator-daemon-uninstall 0 0 0 0 0 katacontainers.io/kata-runtime=cleanup 31s
daemonset.apps/cc-operator-pre-install-daemon 1 1 1 1 1 node.kubernetes.io/worker= 31s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cc-operator-controller-manager 1/1 1 1 42s
NAME DESIRED CURRENT READY AGE
replicaset.apps/cc-operator-controller-manager-ccbbcfdf7 1 1 1 42s Then the NAME READY STATUS RESTARTS AGE
pod/cc-operator-controller-manager-ccbbcfdf7-v54gc 2/2 Running 0 103s
pod/cc-operator-daemon-install-xvqcq 1/1 Terminating 0 89s
pod/cc-operator-pre-install-daemon-c4m9d 1/1 Running 0 92s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cc-operator-controller-manager-metrics-service ClusterIP 10.107.199.43 <none> 8443/TCP 103s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/cc-operator-pre-install-daemon 1 1 1 1 1 node.kubernetes.io/worker= 92s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cc-operator-controller-manager 1/1 1 1 103s
NAME DESIRED CURRENT READY AGE
replicaset.apps/cc-operator-controller-manager-ccbbcfdf7 1 1 1 103s Which is odd as it should have not escaped from the loop. My assumption is the To unstuck I tried re-installing it: kubectl apply -k .
ccruntime.confidentialcontainers.org/ccruntime-sample created
# kubectl -n confidential-containers-system get all
NAME READY STATUS RESTARTS AGE
pod/cc-operator-controller-manager-ccbbcfdf7-v54gc 2/2 Running 0 40m
pod/cc-operator-daemon-install-w9ppp 1/1 Running 0 102s
pod/cc-operator-pre-install-daemon-c4m9d 1/1 Running 0 40m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cc-operator-controller-manager-metrics-service ClusterIP 10.107.199.43 <none> 8443/TCP 40m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/cc-operator-daemon-install 1 1 1 1 1 node.kubernetes.io/worker= 102s
daemonset.apps/cc-operator-daemon-uninstall 0 0 0 0 0 katacontainers.io/kata-runtime=cleanup 102s
daemonset.apps/cc-operator-pre-install-daemon 1 1 1 1 1 node.kubernetes.io/worker= 40m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cc-operator-controller-manager 1/1 1 1 40m
NAME DESIRED CURRENT READY AGE
replicaset.apps/cc-operator-controller-manager-ccbbcfdf7 1 1 1 40m Which, as you might see succeeded. I tried to delete it again: # kubectl delete -k .
ccruntime.confidentialcontainers.org "ccruntime-sample" deleted
# kubectl -n confidential-containers-system get all
NAME READY STATUS RESTARTS AGE
pod/cc-operator-controller-manager-ccbbcfdf7-v54gc 2/2 Running 0 44m
pod/cc-operator-daemon-install-w9ppp 1/1 Terminating 0 5m43s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cc-operator-controller-manager-metrics-service ClusterIP 10.107.199.43 <none> 8443/TCP 44m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cc-operator-controller-manager 1/1 1 1 44m
NAME DESIRED CURRENT READY AGE
replicaset.apps/cc-operator-controller-manager-ccbbcfdf7 1 1 1 44m And after few minutes I finally got the expected result.
So it looks like the |
@bpradipt would you have any idea why the |
Describe the bug
Running a ccruntime/operator install/uninstall in a loop leads to left-behind pods
To Reproduce
Steps to reproduce the behavior:
Describe the results you expected
It should keep creating and deleting the operator with no left-behind resources
Describe the results you received:
After about 25 iterations the TEST phase took unusually longer:
and following cleanup (the
./operator.sh uninstall
) failed with:And the confidential-containers-system contained contained:
Basically the last steps were:
operator_tests.bats
ccruntime
bykubectl delete -k .
cc-operator-daemon-install
andcc-operator-pre-install-daemon
pods to be gonekata
is not in runtime classes! kubectl get --no-headers runtimeclass 2>/dev/null | grep -q kata
"cc-preinstall/done":"true"
there and interrupted the further executionWhat is odd is that the step
4
as well as5
was true when I was checking things after the failure so it looks like the pods got deleted, perhaps even the runtime classes got deleted, but later they got re-created by the daemonset. The question is why the daemonset was not removed or whether there is another issue going on. Also why did the test took 622s while it usually takes 210 - 420s.Additional context
This issue is a reproducer of one real CI issue from #339
The text was updated successfully, but these errors were encountered: