Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update demo #48

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions demos/basic.demo
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Demo: how to install, configure and use Knvavigator in a local k8s cluster, such as minikube or KinD

# Create a minkube or kind cluster

# Show the cluster infomation
kubectl cluster-info

Expand All @@ -15,6 +17,10 @@ kubectl apply -f charts/overrides/kwok/pod-complete.yml
kubectl apply -f https://github.com/${KWOK_REPO}/raw/main/kustomize/stage/pod/chaos/pod-init-container-running-failed.yaml
kubectl apply -f https://github.com/${KWOK_REPO}/raw/main/kustomize/stage/pod/chaos/pod-container-running-failed.yaml

# Set up virtual nodes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to deploy nodes separately if we are using Configure task

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you try to rebase/rebuild? I'm not seeing this issue

$ ./bin/knavigator -tasks ./resources/tests/k8s/test-job.yml
I0520 12:08:39.910353 1099652 k8s_config.go:42] "Using external kubeconfig"
I0520 12:08:39.915986 1099652 main.go:84] "Starting test" name="test-k8s-job"
I0520 12:08:39.916034 1099652 engine.go:111] "Creating task" name="RegisterObj" id="register"
I0520 12:08:39.916580 1099652 engine.go:247] "Starting task" id="RegisterObj/register"
I0520 12:08:39.916600 1099652 engine.go:253] "Task completed" id="RegisterObj/register" duration="3.535µs"
I0520 12:08:39.916612 1099652 engine.go:111] "Creating task" name="Configure" id="configure"
I0520 12:08:39.916795 1099652 engine.go:247] "Starting task" id="Configure/configure"
I0520 12:08:40.802256 1099652 engine.go:253] "Task completed" id="Configure/configure" duration="885.42569ms"
I0520 12:08:40.802304 1099652 engine.go:111] "Creating task" name="SubmitObj" id="job"
I0520 12:08:40.802636 1099652 engine.go:247] "Starting task" id="SubmitObj/job"
I0520 12:08:40.850344 1099652 engine.go:253] "Task completed" id="SubmitObj/job" duration="47.67867ms"
I0520 12:08:40.850383 1099652 engine.go:111] "Creating task" name="CheckPod" id="status"
I0520 12:08:40.850559 1099652 engine.go:247] "Starting task" id="CheckPod/status"
I0520 12:08:40.850576 1099652 check_pod_task.go:158] "Create pod informer" #pod=2 timeout="5s"
I0520 12:08:40.971440 1099652 check_pod_task.go:256] "Accounted for all pods"
I0520 12:08:40.971488 1099652 engine.go:253] "Task completed" id="CheckPod/status" duration="120.910655ms"
I0520 12:08:40.971503 1099652 engine.go:259] "Reset Engine"
$ k get no
NAME                    STATUS   ROLES           AGE     VERSION
test-control-plane      Ready    control-plane   11d     v1.29.2
virtual-dgxa100.80g-0   Ready    agent           9d      fake
virtual-dgxa100.80g-1   Ready    agent           3h58m   fake

Copy link
Collaborator Author

@yuanchen8911 yuanchen8911 May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize you've added a Configure task to the test, but I've noticed the following issues with the latest virtual node configuration. Can you take a look?

  1. Virtual nodes created in the task are NotReady.
$ k get nodes
NAME                    STATUS     ROLES           AGE     VERSION
minikube                Ready      control-plane   2m12s   v1.30.0
virtual-dgxa100.80g-0   NotReady   agent           118s    fake
virtual-dgxa100.80g-1   NotReady   agent           118s    fake
  1. The job shows Running while the pods are Pending.
$ k get job
NAME   STATUS    COMPLETIONS   DURATION   AGE
job1   Running   0/2           15s        15s

$k get pods
NAME           READY   STATUS    RESTARTS   AGE
job1-0-254qd   0/1     Pending   0          18s
job1-1-vsgkl   0/1     Pending   0          18s
  1. Running the test with the Configure task will remove the virtual nodes that were created by helm before.
  2. Deleting the test job will make all virtual nodes (from helm and task) become NotReady.
  3. Uninstalling the virtual node helm chart will remove all virtual nodes, including the those that were configured in a task.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a brand new kind cluster and try it?

Copy link
Collaborator Author

@yuanchen8911 yuanchen8911 May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same. The nodes were Ready first and then became NotReaday. The pods are Pending. The test failed. BTW, I ran the test in minikube before.

$ k get nodes
NAME                    STATUS     ROLES           AGE    VERSION
test-control-plane      Ready      control-plane   106s   v1.29.2
virtual-dgxa100.80g-0   NotReady   agent           48s    fake
virtual-dgxa100.80g-1   NotReady   agent           48s    fake

$ k get pods
NAME           READY   STATUS    RESTARTS   AGE
job1-0-892gm   0/1     Pending   0          61s
job1-1-r8gcf   0/1     Pending   0          61s

$ k get jobs
NAME   COMPLETIONS   DURATION   AGE
job1   0/2           65s        65s

helm upgrade --install virtual-nodes charts/virtual-nodes -f charts/virtual-nodes/values-example.yaml
kubectl get nodes

# Build Knavigator
make build

Expand Down
2 changes: 1 addition & 1 deletion demos/basic.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading