Add reconciler to shutdown a subcluster #777

roypaulin · 2024-04-23T15:50:39Z

This adds a new reconciler that will be used to shutdown a subcluster. It will call the stopSubcluster vclusterops API .

This pulls in the latest vclusterops library changes. Many of the changes that we are pulling in were changes related to improving the vcluster CLI.

This pulls in the new signature for VStartDatabase. A new parameter was returned, which we can safely ignore for the operators usage.

During a replicated upgrade, we must categorize the subclusters into one of two replica groups. Initially, all subclusters participating in the upgrade will be placed in replica group A. Subsequently, new subclusters will be generated throughout the process and assigned to the second group, named replica group B. This change manages the assignment of subclusters at the beginning of the upgrade process. To facilitate this change, new status fields will be introduced to the VerticaDB, enabling the tracking of this state across reconcile iterations.

This adds a new controller for sandbox. It is going to watch a ConfigMap that has labels on it to indicate it contains state for a sandbox. That ConfigMap also contains the vdb name whose subclusters are involved in sandbox/unsanbox. For now it simply reads the VerticaDB found in the configMap and do nothing else. Some changes are also made to the VerticaDB API to add new fields related to sanbox

This PR eliminates labels from the `VerticaReplicator` sample config. The intention is to enhance the user experience on OpenShift when generating this CR from the webUI. OpenShift utilizes this sample to prepopulate fields in the webUI interface.

This adds a few webhook rules for replicated upgrade. It's probably not a complete list, but we can always add more later when we find rules to check. This commit adds the following: - ensures the subclusters involved in the upgrade are stable. They cannot be removed and the sizes can't change. - We do allow new subclusters to be added to one of the replica groups. But they must be secondary subclusters. No new primaries. - ensures that subclusters listed in the `.status.updateState.replicaGroups` aren't repeated.

We now collect the sandbox name in podfacts. We updated the vsql query to return the extra state. We built an interface to get the information as we intend to use a REST endpoint in the future but still need support via vsql for older releases.

This adds a new leg, e2e-leg-9, to the CI. This is going to be used to test changes done in 24.3.0. One new change with this is that it will use a vertica license so that we can test scaling past 3 nodes. This will be required in order to test out the new replicated upgrade work.

#764) `vrep` reconciler enforces the minimum source and target db versions to be at least 24.3.0 --------- Co-authored-by: Roy Paulin <[email protected]> Co-authored-by: spilchen <[email protected]>

This PR adds below webhooks rules for sandboxes in CRD: - cannot scale (up or down) any subcluster that is in a sandbox - cannot remove a subcluster that is sandboxed. It must be unsandboxed first. - cannot have multiple sandboxes with the same name - cannot have the image of a sandbox be different than the main cluster before the sandbox has been setup - cannot be used on versions older than 24.3.0 - cannot be used before the database has been initialized - cannot have duplicate subclusters defined in a sandbox - cannot have a subcluster defined in multiple sandboxes - cannot have a non-existing subcluster defined in a sandbox We could add more rules later for sandboxes when we have needs.

we want to use restart reconciler in both the VerticaDB controller and the sandbox controller. This makes some changes to the restart reconciler in order to achieve that. The sandbox name (empty string if VerticaDB controller) is passed to the reconciler and it uses it to target only pods belonging to that sandbox( or to target only pods belonging to the main cluster if sandbox name is empty)

This change adds stubs to the replicated upgrade reconciler. It attempts to have the functions present that are needed to do the upgrade. Due to dependencies, such as sandboxing and replication, comprehensive testing is currently limited to unit tests. While this logic might evolve during integration, it gives a framework for how the upgrade should function. Additionally, I identified new requirements for state management. Two new annotations have been introduced to the VerticaDB: - vertica.com/replicated-upgrade-sandbox: to record the name of the sandbox created for the upgrade - vertica.com/replicated-upgrade-replicator-name: to track the name of the VerticaReplicator CR utilized for replicating data to the sandbox.

This PR adds a new webhook rule for sandboxes in CRD: cannot have a primary subcluster defined in a sandbox.

This adds a new fetcher in podfacts to fetch node details from the running database inside the pod. This new fetcher will call VClusterOps API which will send an HTTP request to the database. The new fetcher will be enabled when VclusterOps annotation is set, and Vertica server version is not older than v24.3.0. The new fetcher should be faster and more reliable than the old fetcher which will execute vsql within the pod.

In the replicated upgrade process, we must pause and redirect client connections from subclusters in replica group A to those in replica group B. This is the initial stage of the change. Currently, pause/redirect semantics are not supported, as they require new server modifications that have not yet been implemented. Therefore, we will perform an old-style drain to pause the process and maintain service labels correctly to point to subclusters in replica group B for redirection. --------- Co-authored-by: Roy Paulin <[email protected]>

This change will modify the upgrade reconcilers, both offline and online, to function in either the main cluster or a sandbox. Our immediate plan is to use the offline upgrade reconciler within the sandbox controller, although this will be done in a follow-on task.

This pull request modifies the Vertica replicator controller to invoke the replication command synchronously. Additionally, it introduces new status conditions within VerticaReplicator to monitor the controller's various states. --------- Co-authored-by: spilchen <[email protected]>

During a replicated upgrade we will promote a sandbox to the main cluster, which cause it to drop its sandbox state, and we need to rename subclusters to match the original names. Prior to this change, the pods deployed by the operator had labels for the sandbox and subcluster name. And the labels were set in the statefulset pod template. Now, in order to change those, the k8s controller will actually do a rollout. This means it needs to restart each of the pods. But we require sandbox promotion and subcluster rename to happen without a pod restart. So, the change here to use different labeling to ensure we don't need to restart the pods. In some cases, if we have a pod we have to refer back to the statefulset in order to figure certain metadata. Since we are changing the statefulset selector values, the statefulset from old operator need to be recreated. We had done this in the past, so I repurposed the upgradeoperator120_reconciler.go for this purpose. A summary of all of the labels that are set is as follows. - All objects will have these labels set: ``` app.kubernetes.io/instance: <CR name> app.kubernetes.io/managed-by: verticadb-operator app.kubernetes.io/component: database app.kubernetes.io/name: vertica app.kubernetes.io/version: 2.2.0 vertica.com/database: <vertica db name> ``` - StatefulSets and Service objects will have the following labels: ``` vertica.com/subcluster-name: <subcluster-name> vertica.com/subcluster-type: <primary|secondary> vertica.com/sandbox: <sandbox name if applicable> ``` - Further Service objects will have this additional label: ``` vertica.com/svc-type: <external|headless> ``` - Pods will have the following labels: ``` vertica.com/subcluster-svc: <name of service object> vertica.com/client-routing: true # Only set if not doing a drain or pending removal vertica.com/subcluster-selector-name: <full name of the pod's statefulset> ``` - ConfigMaps used by the sandbox controller will have this label: ``` vertica.com/watched-by-sandbox-controller: true ``` - When the statefulset sets up the pod selector, it will use these labels: ``` app.kurbernetes.io/instance vertica.com/subcluster-selector-name ``` - When the service objects sets up its pod selector, it will use these labels: ``` app.kurbernetes.io/instance vertica.com/subcluster-selector-name vertica.com/client-routing ```

This is a quick fix. When I merge my branch (#792) into vnext, I don't pull the latest changes from Roy's PR (#785). This PR will resolve the e2e tests issue

This will create/update a sandbox config map whenever we create a new sandbox in sandbox reconciler. The config map is watched by sandbox controller for restarting/upgrading the nodes in a sandbox. The config map will be deleted if vdb is deleted or the sandbox is unsandboxed.

test the following scenarios: - replicate from main cluster of source db with default username (`dbadmin`) to a sandbox cluster of target db with custom username and no password - replicate from main cluster of source db with default username (`dbadmin`) to a sandbox cluster of target db with custom username and password - replicate from main cluster of source dbwith default username (`dbadmin`) to a sandbox cluster of target db with default username (`dbadmin`) and source TLS config

A nondeterministic failure occurs when a label is updated based on the sandbox spec. We need to ensure that the label is updated only after the status is set by the sandbox subcluster

This PR incorporates the latest changes from the vCluster API to update the sandbox option in vrep

The ConfigMap should have the name of the VerticaDB, not the database name. The sandbox controller uses this to figure out the correct VerticaDB to fetch.

This adds support for multiple subclusters in a single sandbox. In order to denote which subclusters are primary in the sandbox, a new subcluster type is added called 'sandboxprimary'. New webhook rules have been added to account for the new subcluster type. The new type will get added to the first subcluster in a sandbox. We now add subclusters to a sandbox in the order they are defined in the vdb spec. This ensures that we pick a deterministic subcluster as the first one in a sandbox. When adding a subcluster to an existing sandbox, we need to pass in a host from the main cluster and the sandbox too. This reason this is needed is that we need to replicate the new sandbox state in both the main cluster and in the sandbox. So, the vclusterOps API will update both clusters.

This was mistakenly removed in the last VCluster API refresh (#741). Without this, if there is an error during revive, then it will complain that local catalog/data/depot dirs are not empty.

The status reconciler in vdb controller was removing a subcluster's status when the subcluster is in a sandbox. Now it will keep unchanged the status of subclusters that do not belong to the passed in sandbox. To do that, it finds all the subclusters regardless of their cluster.

…806)

This incorporates the latest changes from the vclusterops library. Many of these changes affect e2e tests. We are pulling the latest updates to incorporate these changes. * Limit version check for down nodes in list_allnode to only CLI * Revert pointer adapter pools because it causes multiple operator threads in the operator * Put vdb.hosts in prepare in HTTPSUpdateNodeStateOp since it will be empty, and the operator can't find hosts to execute this operation * Add username to HTTPS operation for re_ip, since the operator can't verify username-password authentication for running httpscheckrunningDB.

This builds out the replicated upgrade logic more. With the sandbox controller and the replicator controller both implemented, the upgrade logic can now depend on them. A few changes were made while verifying the new controllers. We can now get to the part where we are waiting for promotion of the sandbox to the main cluster. All traffic is routed to the sandbox, which have been upgraded to the new server version.

When setting a sandbox, if the image is omitted, the operator will internally assume spec.image as the sandbox's image. The issue is that changing spec.image could trigger sandbox upgrade too. Now if the image is omitted, there is a new reconciler that will explicitly set the sandbox image in vdb to spec.image value. --------- Signed-off-by: roypaulin <[email protected]>

This PR added two unsandbox reconcilers: one in vdb controller and the other in sandbox controller. The one in vdb controller will update sandbox config map for triggering sandbox controller. The one in sandbox controller will call vclusterOps to unsandbox the subclusters, update vdb sandbox status, and delete the config map. After that, the restart reconciler in vdb controller should restart the nodes that have been unsandboxed.

After unsandbox, the catalog dir is deleted. The issue we check its existence to set dbExists in the pod facts. With this change, dbExists is true if the catalog dir exists or if addedToDB field in the subcluster status detail for that specific pod is true.

This update addresses an issue that occurs when multiple subclusters are added to a new sandbox during the same reconciliation iteration. The problem arose after sandboxing the first subcluster; the second subcluster failed to sandbox. The failure was due to the need to provide both a host from the main cluster and a host from the sandbox to the vcluster API. To obtain the sandbox host, the system relies on the `vdb.status.sandbox` being updated. Since this update wasn't happening, the process of finding the sandbox host by checking pod facts didn't locate any sandboxed pods. The solution is to update `vdb.status.sandboxes` as each subcluster is sandboxed. This update allows subsequent sandboxes to find the correct pod facts, ensuring the process completes successfully for all subclusters. --------- Co-authored-by: Roy Paulin <[email protected]>

…operator (#813) When restarting a pod in a database with a sandbox, it could fail when the IP of the pod changes. This is a bug in old vclusterOps code. We were trying to read node IPs from the catalog, but sandbox node IP change cannot be reflected in the main cluster catalog, and vice versa. This PR applied the new vclusterOps version and passed a new option to start_db to fix this issue. With this option, we will read main cluster node IPs from main cluster catalog and read sandbox node IPs from sandbox catalog.

In k8s, the node state polling timeout is controlled by the annotation "vertica.com/restart-timeout". If it's not set, the polling timeout is set to 0 which will fail vclusterOps start_db and start_node. Then we will have many cluster/node restart failure events. This PR simply used vclusterOps default polling timeout if the annotation "vertica.com/restart-timeout" is not set.

cchen-vertica

The stop_subcluster reconciler is running before unsandboxing, but we removed the start_up.json until unsandboxing. The container could auto-restart the vertica. We'd better to move that delete start_up.json code from unsandbox reconciler to this reconciler.

cchen-vertica · 2024-05-30T13:24:12Z

pkg/controllers/sandbox/sandbox_controller.go

@@ -156,6 +156,7 @@ func (r *SandboxConfigMapReconciler) constructActors(vdb *v1.VerticaDB, log logr
 	return []controllers.ReconcileActor{
 		// Ensure we support sandboxing and vclusterops
 		MakeVerifyDeploymentReconciler(r, vdb, log),
+		vdbcontroller.MakeStopSubclusterReconciler(r, log, vdb, pfacts, dispatcher),


Add a comment above for what this reconciler does.

cchen-vertica · 2024-05-30T13:32:26Z

pkg/controllers/vdb/stopsubcluster_reconciler.go

+		needRequeue = true
+	}
+
+	if err := s.stopSubclusters(ctx, scsToStop); err != nil {


Stop subclusters should be done after requeue check(line 77). If one sc is not up before stopping subclusters, then you stop the other scs. We will come here again, and the needRequeue is still true. We will do another round of reconciliation.

cchen-vertica · 2024-05-30T13:46:43Z

pkg/controllers/vdb/stopsubcluster_reconciler.go

+		// then we skip the current subcluster and move to next one
+		ip, ok := s.getFirstUpSCPodIP(sc)
+		if !ok {
+			continue


You verified the running pods in filterSubclustersWithRunningPods(). If we come here and the sc has any pod stopped, we'd better requeue the reconciler since 'needRequeue' is not accurate this time. You calculated 'needRequeue' based on the result of filterSubclustersWithRunningPods().

cchen-vertica · 2024-05-30T13:51:16Z

pkg/vadmin/stop_sc_vc.go

+	opts.IPv6 = net.IsIPv6(s.InitiatorIP)
+
+	opts.SCName = s.SCName
+	// For now we shutdown the subcluster right away


Do we have a follow-up Jira for the draining timeout?

cchen-vertica · 2024-05-30T15:43:19Z

This PR has lower priority. You could address the comments later.

CLAassistant · 2024-07-25T03:07:27Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 5 committers have signed the CLA.

✅ jizhuoyu
✅ roypaulin
✅ cchen-vertica
✅ chinhtranvan
❌ spilchen

spilchen seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

spilchen and others added 28 commits March 21, 2024 14:37

VCluster API Refresh (#741)

e3780ce

This pulls in the latest vclusterops library changes. Many of the changes that we are pulling in were changes related to improving the vcluster CLI.

Merge branch 'main' into vnext

5fe8480

VerticaReplicator Skeleton API and Controller (#740)

3cd7e32

Add replicated upgradePolicy (#743)

7d60ec3

Merge branch 'main' into vnext

ba19325

VCluster API Refresh

c857fa1

This pulls in the new signature for VStartDatabase. A new parameter was returned, which we can safely ignore for the operators usage.

Merge branch 'main' into vnext

9a52d24

Merge branch 'main' into vnext

447ee00

Collect sandbox name in podfacts (#762)

cb1cc81

We now collect the sandbox name in podfacts. We updated the vsql query to return the extra state. We built an interface to get the information as we intend to use a REST endpoint in the future but still need support via vsql for older releases.

vrep reconciler enforces minimum version of source and target vdbs (

03f626b

#764) `vrep` reconciler enforces the minimum source and target db versions to be at least 24.3.0 --------- Co-authored-by: Roy Paulin <[email protected]> Co-authored-by: spilchen <[email protected]>

Replicated upgrade: create secondaries based on primaries (#765)

8149ce2

Add a new webhook rule for sandboxes (#770)

5510318

This PR adds a new webhook rule for sandboxes in CRD: cannot have a primary subcluster defined in a sandbox.

Merge branch 'main' into vnext

91fce6b

Add stop-sc reconciler

a8e5184

Merge remote-tracking branch 'origin/vnext' into roypaulin/stop-sc

52ce42c

Fix unit tests

28edf0c

roypaulin requested a review from spilchen as a code owner April 23, 2024 15:50

roypaulin self-assigned this Apr 23, 2024

spilchen and others added 22 commits May 3, 2024 19:58

fix tests (#794)

cb54f6e

This is a quick fix. When I merge my branch (#792) into vnext, I don't pull the latest changes from Roy's PR (#785). This PR will resolve the e2e tests issue

Update sandbox label (#798)

aac39ab

A nondeterministic failure occurs when a label is updated based on the sandbox spec. We need to ensure that the label is updated only after the status is set by the sandbox subcluster

Add sandbox option in vrep (#795)

ce73ec8

This PR incorporates the latest changes from the vCluster API to update the sandbox option in vrep

Merge branch 'main' into vnext

656356a

Set VerticaDB name in ConfigMap (#802)

fbea432

The ConfigMap should have the name of the VerticaDB, not the database name. The sandbox controller uses this to figure out the correct VerticaDB to fetch.

Use ForceRemoval option with revive (#804)

4153fb2

This was mistakenly removed in the last VCluster API refresh (#741). Without this, if there is an error during revive, then it will complain that local catalog/data/depot dirs are not empty.

Add test step for verify-vrep to verify replication from empty db (#…

40520b8

…806)

Merge remote-tracking branch 'origin/vnext' into roypaulin/stop-sc

e7b91a8

Address comments

e16c306

roypaulin requested review from jizhuoyu and chinhtranvan as code owners May 21, 2024 14:28

roypaulin requested a review from cchen-vertica May 21, 2024 14:29

cchen-vertica reviewed May 30, 2024

View reviewed changes

cchen-vertica force-pushed the vnext branch from 5bdb36b to 29db384 Compare July 24, 2024 19:38

Base automatically changed from vnext to main July 26, 2024 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reconciler to shutdown a subcluster #777

Add reconciler to shutdown a subcluster #777

roypaulin commented Apr 23, 2024

cchen-vertica left a comment

cchen-vertica May 30, 2024

cchen-vertica May 30, 2024

cchen-vertica May 30, 2024

cchen-vertica May 30, 2024

cchen-vertica commented May 30, 2024

CLAassistant commented Jul 25, 2024 •

edited

Loading

Add reconciler to shutdown a subcluster #777

Are you sure you want to change the base?

Add reconciler to shutdown a subcluster #777

Conversation

roypaulin commented Apr 23, 2024

cchen-vertica left a comment

Choose a reason for hiding this comment

cchen-vertica May 30, 2024

Choose a reason for hiding this comment

cchen-vertica May 30, 2024

Choose a reason for hiding this comment

cchen-vertica May 30, 2024

Choose a reason for hiding this comment

cchen-vertica May 30, 2024

Choose a reason for hiding this comment

cchen-vertica commented May 30, 2024

CLAassistant commented Jul 25, 2024 • edited Loading

CLAassistant commented Jul 25, 2024 •

edited

Loading