Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

online upgrade with package installation and pending pods #961

Merged
merged 32 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
82cf6ac
make sure all nodes are up for online upgrade
qindotguan Oct 14, 2024
2c41466
add checkNodesUp into the online upgrade reconciler
qindotguan Oct 15, 2024
a90a8c2
online upgrade with package install and a pending pod
qindotguan Oct 15, 2024
6111835
pending pods donnot have currentReplicas
qindotguan Oct 15, 2024
a24e665
add step to let pod running
qindotguan Oct 15, 2024
03bd47d
restart the main cluster if not all nodes are up
qindotguan Oct 16, 2024
521a504
add hint message if package installation failed
qindotguan Oct 16, 2024
5e036bb
update the event message
qindotguan Oct 16, 2024
f8b3273
remove the event checking of SandboxSubclusterStart
qindotguan Oct 16, 2024
cc711b9
Merge branch 'main' into qguan/online-upgrade-packages
qindotguan Oct 16, 2024
9e39dc9
update the expected sandbox and subcluster list
qindotguan Oct 17, 2024
6a5957f
reorder the subcluster list in the test assert files
qindotguan Oct 18, 2024
4771f75
requeue if found pods not running
qindotguan Oct 21, 2024
4efd782
use log but not event
qindotguan Oct 21, 2024
ba02231
get pods not in running status
qindotguan Oct 21, 2024
1324be0
log error message only if no pacakge installed
qindotguan Oct 21, 2024
dbda6e9
Merge branch 'main' into qguan/online-upgrade-packages
qindotguan Oct 21, 2024
1d62c4b
Merge branch 'main' into qguan/online-upgrade-packages
qindotguan Oct 22, 2024
03ffeb9
move online upgrade fall back log before the event
qindotguan Oct 22, 2024
87b65e1
Add requeuePodsNotRunningMsg in onlineUpgradeStatusMsgs
qindotguan Oct 23, 2024
0d2c12e
remove duplicated checks in other tests
qindotguan Oct 23, 2024
fbacb19
rename online-upgrade-with-package-install to online-upgrade-pods-pen…
qindotguan Oct 23, 2024
8fad603
Merge branch 'main' into qguan/online-upgrade-packages
qindotguan Oct 23, 2024
a668a0f
revert the change to the devcluster
qindotguan Oct 23, 2024
031998c
remove extra dot in finish package installation msg
qindotguan Oct 24, 2024
bb882b4
move requeue pending after postStartOnlineUpgradeMsg
qindotguan Oct 25, 2024
e1106a4
validate pod pending and requeue status
qindotguan Oct 25, 2024
bd814df
update pending
qindotguan Oct 25, 2024
ef2164c
test with initPolicy CreateSkipPackageInstall
qindotguan Oct 25, 2024
2a3f068
Merge branch 'main' into qguan/online-upgrade-packages
qindotguan Oct 29, 2024
e6c8398
export anyPodsNotRunning as podfacts moved to a separated pacakge
qindotguan Oct 29, 2024
60fb071
rename anyPodsNotRunning to AnyPodsNotRunning
qindotguan Oct 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions pkg/controllers/vdb/installpackages_reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,10 @@ func (i *InstallPackagesReconciler) runCmd(ctx context.Context, initiatorName ty
"skipped installation package list", categorizedStatus.skippedPackages,
)

if len(categorizedStatus.succeededPackages) == 0 {
i.Log.Info("No pacakges was installed. This may due to lack of memory resources or other internal errors.")
}

return err
}

Expand Down
27 changes: 27 additions & 0 deletions pkg/controllers/vdb/onlineupgrade_reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,8 @@ func (r *OnlineUpgradeReconciler) Reconcile(ctx context.Context, _ *ctrl.Request

// Functions to perform when the image changes. Order matters.
funcs := []func(context.Context) (ctrl.Result, error){
// Requeue if not all nodes are running
r.requeuePodsNotRunning,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a status msg in onlineUpgradeStatusMsgs for this step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put them after postStartOnlineUpgradeMsg

// Initiate an upgrade by setting condition and event recording
r.startUpgrade,
r.logEventIfThisUpgradeWasNotChosen,
Expand Down Expand Up @@ -290,6 +292,31 @@ func (r *OnlineUpgradeReconciler) loadUpgradeState(ctx context.Context) (ctrl.Re
return ctrl.Result{}, nil
}

// requeuePodsNotRunning will requeue the upgrade process if not all pods are running.
func (r *OnlineUpgradeReconciler) requeuePodsNotRunning(ctx context.Context) (ctrl.Result, error) {
// We skip this if we have already added the new subclusters
if vmeta.GetOnlineUpgradeStepInx(r.VDB.Annotations) > addSubclustersInx {
return ctrl.Result{}, nil
}

// For pods are pending due to lack of resources, we requeue restarting them and wait
// for user operation.
mainPFacts := r.PFacts[vapi.MainCluster]
found, _ := mainPFacts.anyPodsNotRunning()
if found {
r.Log.Info("Not all pods are running, requeuing.")
return ctrl.Result{Requeue: true}, nil
}

// to restart the main cluster if any down pods found
res, err := r.restartMainCluster(ctx)
if verrors.IsReconcileAborted(res, err) {
return res, err
}

return ctrl.Result{}, nil
}

// assignSubclustersToReplicaGroupA will go through all of the subclusters involved
// in the upgrade and assign them to the first replica group. The assignment is
// saved in the status.upgradeState.replicaGroups field.
Expand Down
11 changes: 11 additions & 0 deletions pkg/controllers/vdb/podfacts.go
Original file line number Diff line number Diff line change
Expand Up @@ -1127,6 +1127,17 @@ func genPodNames(pods []*PodFact) string {
return strings.Join(podNames, ", ")
}

// anyPodsNotRunning returns true if any pod isn't running. It could be still pending scheduling due to
// lack of resources. It will return the name of the first pod that isn't running.
func (p *PodFacts) anyPodsNotRunning() (bool, types.NamespacedName) {
for _, v := range p.Detail {
if !v.isPodRunning {
return true, v.name
}
}
return false, types.NamespacedName{}
}

// anyInstalledPodsNotRunning returns true if any installed pod isn't running. It will
// return the name of the first pod that isn't running.
func (p *PodFacts) anyInstalledPodsNotRunning() (bool, types.NamespacedName) {
Expand Down
9 changes: 9 additions & 0 deletions pkg/controllers/vdb/upgrade.go
Original file line number Diff line number Diff line change
Expand Up @@ -845,6 +845,15 @@ func (i *UpgradeManager) routeClientTraffic(ctx context.Context, pfacts *PodFact
func (i *UpgradeManager) logEventIfRequestedUpgradeIsDifferent(actualUpgrade vapi.UpgradePolicyType) {
if !i.ContinuingUpgrade && i.Vdb.Spec.UpgradePolicy != actualUpgrade && i.Vdb.Spec.UpgradePolicy != vapi.AutoUpgrade {
actualUpgradeAsText := strings.ToLower(string(actualUpgrade))

if i.Vdb.Spec.UpgradePolicy == "Online" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this before the event recorder.

i.Log.Info("Not all online upgrade prerequisites met. Please make sure: " +
"1. Vertica server version is 24.3.0-2 or higher. " +
"2. Cluster was deployed using `vclusterops`. " +
"3. A license file was applied to allow double the DB nodes. " +
"4. No sandbox defined.")
}

i.Rec.Eventf(i.Vdb, corev1.EventTypeNormal, events.IncompatibleUpgradeRequested,
"Requested upgrade is incompatible with the Vertica deployment. Falling back to %s upgrade.", actualUpgradeAsText)
}
Expand Down
2 changes: 1 addition & 1 deletion pkg/vadmin/install_packages_vc.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ func (v *VClusterOps) InstallPackages(_ context.Context, opts ...installpackages
}
if err != nil {
_, err = v.logFailure("VInstallPackages", events.InstallPackagesFailed, err)
v.Log.Error(err, "failed to finish package installation", "installPackageStatus", *status)
v.Log.Error(err, "failed to finish package installation.", "installPackageStatus", *status)
return status, err
}

Expand Down
22 changes: 22 additions & 0 deletions tests/e2e-leg-9/online-upgrade-with-package-install/02-assert.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test does too many things already validated by other tests. I think what you want to check is that we do not pass the first function in online upgrade if some pods are pending so make some pod pending, start online upgrade, wait a little and check the upgradeStatus to verify that we are still stuck in that function, make the pending pods run and now wait until we create the new subclusters and the test can end there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I removed the duplicated checks and renamed the test to online-upgrade-pods-pending.

Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: integration-test-role
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: integration-test-rb
18 changes: 18 additions & 0 deletions tests/e2e-leg-9/online-upgrade-with-package-install/02-rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- script: kubectl apply --namespace $NAMESPACE -f ../../manifests/rbac/base/rbac.yaml
ignoreFailure: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- script: kustomize build ../../manifests/communal-creds/overlay | kubectl apply -f - --namespace $NAMESPACE
- script: kustomize build ../../manifests/priv-container-creds/overlay | kubectl apply -f - --namespace $NAMESPACE
- script: kustomize build ../../manifests/vertica-license/overlay | kubectl apply -f - --namespace $NAMESPACE
21 changes: 21 additions & 0 deletions tests/e2e-leg-9/online-upgrade-with-package-install/10-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: Pod
metadata:
namespace: verticadb-operator
labels:
control-plane: verticadb-operator
status:
phase: Running
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
45 changes: 45 additions & 0 deletions tests/e2e-leg-9/online-upgrade-with-package-install/15-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-pri1
status:
currentReplicas: 2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-pri-2
status:
currentReplicas: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-sec1
status:
currentReplicas: 2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-sec2
status:
currentReplicas: 1
---
apiVersion: vertica.com/v1
kind: VerticaDB
metadata:
name: v-base-upgrade
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- command: bash -c "kustomize build setup-vdb/overlay | kubectl -n $NAMESPACE apply -f - "
58 changes: 58 additions & 0 deletions tests/e2e-leg-9/online-upgrade-with-package-install/20-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-pri1
status:
currentReplicas: 2
readyReplicas: 2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-pri-2
status:
currentReplicas: 1
readyReplicas: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-sec1
status:
currentReplicas: 2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-sec2
status:
currentReplicas: 1
readyReplicas: 1
---
apiVersion: vertica.com/v1
kind: VerticaDB
metadata:
name: v-base-upgrade
status:
subclusters:
- addedToDBCount: 2
upNodeCount: 2
- addedToDBCount: 1
upNodeCount: 1
- addedToDBCount: 2
upNodeCount: 2
- addedToDBCount: 1
upNodeCount: 1
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Intentionally empty to give this step a name in kuttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- command: bash -c "../../../scripts/wait-for-verticadb-steady-state.sh -n verticadb-operator -t 360 $NAMESPACE"
20 changes: 20 additions & 0 deletions tests/e2e-leg-9/online-upgrade-with-package-install/26-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-sec2
status:
availableReplicas: 0
replicas: 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: vertica.com/v1
kind: VerticaDB
metadata:
name: v-base-upgrade
spec:
subclusters:
- name: sec1
size: 2
type: secondary
- name: sec2
size: 1
resources:
requests:
memory: 1Ti
type: secondary
- name: pri1
size: 2
- name: pri_2
size: 1
type: primary
Loading
Loading