Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

online upgrade with package installation and pending pods #961

Merged
merged 32 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
82cf6ac
make sure all nodes are up for online upgrade
qindotguan Oct 14, 2024
2c41466
add checkNodesUp into the online upgrade reconciler
qindotguan Oct 15, 2024
a90a8c2
online upgrade with package install and a pending pod
qindotguan Oct 15, 2024
6111835
pending pods donnot have currentReplicas
qindotguan Oct 15, 2024
a24e665
add step to let pod running
qindotguan Oct 15, 2024
03bd47d
restart the main cluster if not all nodes are up
qindotguan Oct 16, 2024
521a504
add hint message if package installation failed
qindotguan Oct 16, 2024
5e036bb
update the event message
qindotguan Oct 16, 2024
f8b3273
remove the event checking of SandboxSubclusterStart
qindotguan Oct 16, 2024
cc711b9
Merge branch 'main' into qguan/online-upgrade-packages
qindotguan Oct 16, 2024
9e39dc9
update the expected sandbox and subcluster list
qindotguan Oct 17, 2024
6a5957f
reorder the subcluster list in the test assert files
qindotguan Oct 18, 2024
4771f75
requeue if found pods not running
qindotguan Oct 21, 2024
4efd782
use log but not event
qindotguan Oct 21, 2024
ba02231
get pods not in running status
qindotguan Oct 21, 2024
1324be0
log error message only if no pacakge installed
qindotguan Oct 21, 2024
dbda6e9
Merge branch 'main' into qguan/online-upgrade-packages
qindotguan Oct 21, 2024
1d62c4b
Merge branch 'main' into qguan/online-upgrade-packages
qindotguan Oct 22, 2024
03ffeb9
move online upgrade fall back log before the event
qindotguan Oct 22, 2024
87b65e1
Add requeuePodsNotRunningMsg in onlineUpgradeStatusMsgs
qindotguan Oct 23, 2024
0d2c12e
remove duplicated checks in other tests
qindotguan Oct 23, 2024
fbacb19
rename online-upgrade-with-package-install to online-upgrade-pods-pen…
qindotguan Oct 23, 2024
8fad603
Merge branch 'main' into qguan/online-upgrade-packages
qindotguan Oct 23, 2024
a668a0f
revert the change to the devcluster
qindotguan Oct 23, 2024
031998c
remove extra dot in finish package installation msg
qindotguan Oct 24, 2024
bb882b4
move requeue pending after postStartOnlineUpgradeMsg
qindotguan Oct 25, 2024
e1106a4
validate pod pending and requeue status
qindotguan Oct 25, 2024
bd814df
update pending
qindotguan Oct 25, 2024
ef2164c
test with initPolicy CreateSkipPackageInstall
qindotguan Oct 25, 2024
2a3f068
Merge branch 'main' into qguan/online-upgrade-packages
qindotguan Oct 29, 2024
e6c8398
export anyPodsNotRunning as podfacts moved to a separated pacakge
qindotguan Oct 29, 2024
60fb071
rename anyPodsNotRunning to AnyPodsNotRunning
qindotguan Oct 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions pkg/controllers/vdb/installpackages_reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,10 @@ func (i *InstallPackagesReconciler) runCmd(ctx context.Context, initiatorName ty
"skipped installation package list", categorizedStatus.skippedPackages,
)

if len(categorizedStatus.succeededPackages) == 0 {
i.Log.Info("No pacakges was installed. This may due to lack of memory resources or other internal errors.")
}

return err
}

Expand Down
36 changes: 36 additions & 0 deletions pkg/controllers/vdb/onlineupgrade_reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ const (
// be sure to add a *StatusMsgInx const below.
var onlineUpgradeStatusMsgs = []string{
"Starting online upgrade",
"Requeue as pods not running",
"Create new subclusters to mimic subclusters in the main cluster",
fmt.Sprintf("Querying the original value of config parameter %q", ConfigParamDisableNonReplicatableQueries),
fmt.Sprintf("Disable non-replicatable queries by setting config parameter %q", ConfigParamDisableNonReplicatableQueries),
Expand All @@ -85,6 +86,7 @@ var onlineUpgradeStatusMsgs = []string{
// Constants for each entry in onlineUpgradeStatusMsgs
const (
startOnlineUpgradeStatusMsgInx = iota
requeuePodsNotRunningMsgInx
createNewSubclustersStatusMsgInx
queryOriginalConfigParamDisableNonReplicatableQueriesMsgInx
disableNonReplicatableQueriesMsgInx
Expand Down Expand Up @@ -165,6 +167,9 @@ func (r *OnlineUpgradeReconciler) Reconcile(ctx context.Context, _ *ctrl.Request

// Functions to perform when the image changes. Order matters.
funcs := []func(context.Context) (ctrl.Result, error){
// Requeue if not all nodes are running
r.postRequeuePodsNotRunningMsg,
r.requeuePodsNotRunning,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a status msg in onlineUpgradeStatusMsgs for this step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put them after postStartOnlineUpgradeMsg

// Initiate an upgrade by setting condition and event recording
r.startUpgrade,
r.logEventIfThisUpgradeWasNotChosen,
Expand Down Expand Up @@ -290,6 +295,37 @@ func (r *OnlineUpgradeReconciler) loadUpgradeState(ctx context.Context) (ctrl.Re
return ctrl.Result{}, nil
}

// postRequeuePodsNotRunningMsg will update the status message to indicate that
// we are requeuing online upgrade if not all pods are running.
func (r *OnlineUpgradeReconciler) postRequeuePodsNotRunningMsg(ctx context.Context) (ctrl.Result, error) {
return r.postNextStatusMsg(ctx, requeuePodsNotRunningMsgInx)
}

// requeuePodsNotRunning will requeue the upgrade process if not all pods are running.
func (r *OnlineUpgradeReconciler) requeuePodsNotRunning(ctx context.Context) (ctrl.Result, error) {
// We skip this if we have already added the new subclusters
if vmeta.GetOnlineUpgradeStepInx(r.VDB.Annotations) > addSubclustersInx {
return ctrl.Result{}, nil
}

// For pods are pending due to lack of resources, we requeue restarting them and wait
// for user operation.
mainPFacts := r.PFacts[vapi.MainCluster]
found, _ := mainPFacts.anyPodsNotRunning()
if found {
r.Log.Info("Not all pods are running, requeuing.")
return ctrl.Result{Requeue: true}, nil
}

// to restart the main cluster if any down pods found
res, err := r.restartMainCluster(ctx)
if verrors.IsReconcileAborted(res, err) {
return res, err
}

return ctrl.Result{}, nil
}

// assignSubclustersToReplicaGroupA will go through all of the subclusters involved
// in the upgrade and assign them to the first replica group. The assignment is
// saved in the status.upgradeState.replicaGroups field.
Expand Down
11 changes: 11 additions & 0 deletions pkg/controllers/vdb/podfacts.go
Original file line number Diff line number Diff line change
Expand Up @@ -1127,6 +1127,17 @@ func genPodNames(pods []*PodFact) string {
return strings.Join(podNames, ", ")
}

// anyPodsNotRunning returns true if any pod isn't running. It could be still pending scheduling due to
// lack of resources. It will return the name of the first pod that isn't running.
func (p *PodFacts) anyPodsNotRunning() (bool, types.NamespacedName) {
for _, v := range p.Detail {
if !v.isPodRunning {
return true, v.name
}
}
return false, types.NamespacedName{}
}

// anyInstalledPodsNotRunning returns true if any installed pod isn't running. It will
// return the name of the first pod that isn't running.
func (p *PodFacts) anyInstalledPodsNotRunning() (bool, types.NamespacedName) {
Expand Down
9 changes: 9 additions & 0 deletions pkg/controllers/vdb/upgrade.go
Original file line number Diff line number Diff line change
Expand Up @@ -845,6 +845,15 @@ func (i *UpgradeManager) routeClientTraffic(ctx context.Context, pfacts *PodFact
func (i *UpgradeManager) logEventIfRequestedUpgradeIsDifferent(actualUpgrade vapi.UpgradePolicyType) {
if !i.ContinuingUpgrade && i.Vdb.Spec.UpgradePolicy != actualUpgrade && i.Vdb.Spec.UpgradePolicy != vapi.AutoUpgrade {
actualUpgradeAsText := strings.ToLower(string(actualUpgrade))

if i.Vdb.Spec.UpgradePolicy == "Online" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this before the event recorder.

i.Log.Info("Not all online upgrade prerequisites met. Please make sure: " +
"1. Vertica server version is 24.3.0-2 or higher. " +
"2. Cluster was deployed using `vclusterops`. " +
"3. A license file was applied to allow double the DB nodes. " +
"4. No sandbox defined.")
}

i.Rec.Eventf(i.Vdb, corev1.EventTypeNormal, events.IncompatibleUpgradeRequested,
"Requested upgrade is incompatible with the Vertica deployment. Falling back to %s upgrade.", actualUpgradeAsText)
}
Expand Down
22 changes: 22 additions & 0 deletions tests/e2e-leg-9/online-upgrade-pods-pending/02-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: integration-test-role
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: integration-test-rb
18 changes: 18 additions & 0 deletions tests/e2e-leg-9/online-upgrade-pods-pending/02-rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- script: kubectl apply --namespace $NAMESPACE -f ../../manifests/rbac/base/rbac.yaml
ignoreFailure: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- script: kustomize build ../../manifests/communal-creds/overlay | kubectl apply -f - --namespace $NAMESPACE
- script: kustomize build ../../manifests/priv-container-creds/overlay | kubectl apply -f - --namespace $NAMESPACE
- script: kustomize build ../../manifests/vertica-license/overlay | kubectl apply -f - --namespace $NAMESPACE
21 changes: 21 additions & 0 deletions tests/e2e-leg-9/online-upgrade-pods-pending/10-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: Pod
metadata:
namespace: verticadb-operator
labels:
control-plane: verticadb-operator
status:
phase: Running
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
45 changes: 45 additions & 0 deletions tests/e2e-leg-9/online-upgrade-pods-pending/15-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-pri1
status:
currentReplicas: 2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-pri-2
status:
currentReplicas: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-sec1
status:
currentReplicas: 2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-sec2
status:
currentReplicas: 1
---
apiVersion: vertica.com/v1
kind: VerticaDB
metadata:
name: v-base-upgrade
17 changes: 17 additions & 0 deletions tests/e2e-leg-9/online-upgrade-pods-pending/15-setup-vdb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- command: bash -c "kustomize build setup-vdb/overlay | kubectl -n $NAMESPACE apply -f - "
58 changes: 58 additions & 0 deletions tests/e2e-leg-9/online-upgrade-pods-pending/20-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-pri1
status:
currentReplicas: 2
readyReplicas: 2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-pri-2
status:
currentReplicas: 1
readyReplicas: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-sec1
status:
currentReplicas: 2
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-sec2
status:
currentReplicas: 1
readyReplicas: 1
---
apiVersion: vertica.com/v1
kind: VerticaDB
metadata:
name: v-base-upgrade
status:
subclusters:
- addedToDBCount: 2
upNodeCount: 2
- addedToDBCount: 1
upNodeCount: 1
- addedToDBCount: 2
upNodeCount: 2
- addedToDBCount: 1
upNodeCount: 1
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Intentionally empty to give this step a name in kuttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- command: bash -c "../../../scripts/wait-for-verticadb-steady-state.sh -n verticadb-operator -t 360 $NAMESPACE"
20 changes: 20 additions & 0 deletions tests/e2e-leg-9/online-upgrade-pods-pending/26-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# (c) Copyright [2021-2024] Open Text.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-sec2
status:
availableReplicas: 0
replicas: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not necessarily mean the pods are pending. You need to validate the pod itself:

Suggested change
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: v-base-upgrade-sec2
status:
availableReplicas: 0
replicas: 1
apiVersion: v1
kind: Pod
metadata:
name: v-base-upgrade-sec2-0
status:
phase: pending

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is helpful.

Loading