Partitioned NodePool Multi-node Consolidation #853

jashandeep-sohi · 2023-12-08T23:35:28Z

Description

Observed Behavior:

I have a few NodePools that I'm using in a "partitioned" manner. Basically, each NodePool is made independent using user-defined requirements & taints, and Pods in different namespaces use different Nodepools.

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: one
spec:
   taints:
        - key: example.com/partition-key
          value: one
          effect: NoSchedule
   requirements:
    - key: example.com/partition-key
      operator: In
      values: ["one"]
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: two
spec:
   taints:
        - key: example.com/partition-key
          value: two
          effect: NoSchedule
   requirements:
    - key: example.com/partition-key
      operator: In
      values: ["two"]

This works fine for the most part, but I'm observing issues with multi-node consolidation.

As far as I can tell, multi-node consolidation looks at all deprovisionable Nodes together:
https://github.com/kubernetes-sigs/karpenter/blob/cc54b340f630b46a26d19a3cbd49d90c8b3a6d45/pkg/controllers/disruption/multinodeconsolidation.go#L44C42-L44C42

Which I think means there's no multi-node consolidation happening (or it's sub-optimal at best). Shouldn't this be done on groups of compatible NodePools independently?

Another place where I think this is a problem is when simulating the scheduling you look at all Pending Pods:
https://github.com/kubernetes-sigs/karpenter/blob/cc54b340f630b46a26d19a3cbd49d90c8b3a6d45/pkg/controllers/disruption/helpers.go#L97C35-L97C35

But if one of those Pending Pods is not compatible with the firstN candidates chosen from all Nodes, then simulation will always complain about unschedulable Pods (highly likely as the number of nodes/paritions increase)

Expected Behavior:

NodePools should be consolidated in groups computed based on their requirements or based on some configurable partition key.

Reproduction Steps (Please include YAML): See above

Versions:

Chart Version: v0.33.0
Kubernetes Version (kubectl version): 1.27

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

jonathan-innis · 2023-12-09T23:07:19Z

This works fine for the most part, but I'm observing issues with multi-node consolidation

We've talked about this a little bit for deprovisioning improvements for the exact reason that you called out: If we order without considering a NodePool boundary, we have some potential to get stuck if those boundaries are independent of each other. There's potential to perhaps try another form of multi-node consolidation where we perform n different multi-node consolidations where n is the number of NodePools on the cluster.

Also, I'm curious: Are you still seeing single node consolidation? I would expect that if we were failing to multi-node consolidaste, we would still attempt to single node consolidate if there are nodes available.

jashandeep-sohi · 2023-12-10T20:42:30Z

Also, I'm curious: Are you still seeing single node consolidation? I would expect that if we were failing to multi-node consolidaste, we would still attempt to single node consolidate if there are nodes available.

I remember this being an issue on an older version, but I'm not seeing it getting blocked on single-node consolidation anymore. I think before it would just short-circuit if there were any Pods in pending state.

There's potential to perhaps try another form of multi-node consolidation where we perform n different multi-node consolidations where n is the number of NodePools on the cluster.

But what if there are 2+ NodePools in each "partition"? For example, with a different NodePool.spec.weight to achieve some kind of priority within a partition. Would doing consolidation on them independently still work? I would think it has to be done on groups of compatible NodePools.

Has introducing the concept of NodePool groups as an explicit API ever been discussed? Something like having a NodePool.spec.group key might make it easier to figure out those boundaries. Figuring it out based on NodePool.spec.template.spec.requirements is probably computationally expensive.
Also, would be nice if group was non-nil, the controller would auto-magically inject taints, labels & requirements like karpenter.sh/nodepool-group={group}. I guess what I'm asking for is first-class support for partitioning.

GnatorX · 2024-03-07T00:27:57Z

#488 I think similar thought and perf improvement.

k8s-triage-robot · 2024-06-05T01:03:29Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-07-05T01:27:35Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

cnmcavoy · 2024-07-12T18:18:15Z

I've been looking at multi-node consolidation and I also discovered these findings independently. The consolidation occurs across all nodepools, which is not ideal if you have very different nodepool configurations in your cluster. If the nodepools are similar, I could see this being a positive, but that is not how our clusters are designed.

There is one other factor that we found that was equally significant if not more significant, multi-node consolidation also does not take into account node architectures. So an amd64 node may be consolidated with a arm node. This in theory could succeed if the workloads were all multi-arch compatible, but that is not the case in our workload clusters, so this consolidation also always fails.

So the combination of nodepool mixing + architecture mixing means multi-node consolidation effectively never finds a successful simulation in our clusters.

cnmcavoy · 2024-07-12T18:32:39Z

/remove-lifecycle rotten

k8s-triage-robot · 2024-08-11T18:49:51Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-08-11T18:49:56Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jashandeep-sohi added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 8, 2023

njtran added consolidation deprovisioning Issues related to node deprovisioning labels Dec 11, 2023

jashandeep-sohi mentioned this issue Dec 12, 2023

feat: support multi-node consolidation for partitioned NodeGroups #858

Closed

jonathan-innis added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/feature labels Dec 26, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 5, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 5, 2024

cnmcavoy mentioned this issue Aug 9, 2024

WIP: docs: Add RFC for multi-node consolidation partitioning #1547

Open

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partitioned NodePool Multi-node Consolidation #853

Partitioned NodePool Multi-node Consolidation #853

jashandeep-sohi commented Dec 8, 2023

jonathan-innis commented Dec 9, 2023

jashandeep-sohi commented Dec 10, 2023

GnatorX commented Mar 7, 2024

k8s-triage-robot commented Jun 5, 2024

k8s-triage-robot commented Jul 5, 2024

cnmcavoy commented Jul 12, 2024

cnmcavoy commented Jul 12, 2024

k8s-triage-robot commented Aug 11, 2024

k8s-ci-robot commented Aug 11, 2024

Partitioned NodePool Multi-node Consolidation #853

Partitioned NodePool Multi-node Consolidation #853

Comments

jashandeep-sohi commented Dec 8, 2023

Description

jonathan-innis commented Dec 9, 2023

jashandeep-sohi commented Dec 10, 2023

GnatorX commented Mar 7, 2024

k8s-triage-robot commented Jun 5, 2024

k8s-triage-robot commented Jul 5, 2024

cnmcavoy commented Jul 12, 2024

cnmcavoy commented Jul 12, 2024

k8s-triage-robot commented Aug 11, 2024

k8s-ci-robot commented Aug 11, 2024