Upgrade Nodegroup with custom AMI fails #1747

btuffreau · 2024-09-27T07:47:33Z

What happened?

I'm using a launch template with a custom ami, the initial provisioning works fine, but whenever I pass a new ami id to the launch template the nodegroup update fails like so:

error: operation UPDATE failed with "InvalidRequest": You cannot specify the field kubernetesVersion when using custom AMIs. (Service: Eks, Status Code: 400, Request ID: 271bf7a2-653e-4a86-8646-7c2b279cbd64)

Example

eks_nodes_launch_template = LaunchTemplate(resource_name="eks-launch-template",
                                           args=LaunchTemplateArgs(
                                               launch_template_name="node-template",
                                               launch_template_data=LaunchTemplateDataArgs(
                                                   instance_type="m5.2xlarge",
                                                   # image_id=ami.id,
                                                   image_id="ami-05a6b4bea29fc7541", # bottlerocket
                                                   # image_id="ami-0660a492772131f50",
                                                   security_group_ids=[eks.cluster_security_group_id],
                                               )))

Nodegroup(resource_name=f"{args.cluster_name}-managed-nodes",
          args=NodegroupArgs(cluster_name=eks.name,
                             capacity_type="ON_DEMAND",
                             release_version=None,
                             version=None,
                             nodegroup_name=f"{args.cluster_name}-managed-nodes",
                             node_role=eks_nodes_role.arn,
                             launch_template=NodegroupLaunchTemplateSpecificationArgs(
                                 name=eks_nodes_launch_template.launch_template_name,
                                 version=eks_nodes_launch_template.latest_version_number
                             ),
                             subnets=[subnet.subnet_id for subnet in private_subnets],
                             scaling_config=NodegroupScalingConfigArgs(desired_size=2,
                                                                       min_size=2,
                                                                       max_size=3)), )

I stripped my original code but I hope it's obvious enough, just switching the image_id from the LaunchTemplate would cause this issue

Output of `pulumi about`

CLI          
Version      3.134.0
Go Version   go1.23.1
Go Compiler  gc

Plugins
KIND      NAME        VERSION
resource  aws         6.49.1
resource  aws-native  0.119.0
resource  kubernetes  4.17.1
language  python      unknown

Host     
OS       darwin
Version  14.5
Arch     arm64

Dependencies:
NAME               VERSION
aws                0.2.5
mypy               1.11.1
pip                24.0
pulumi_aws         6.49.1
pulumi_aws_native  0.119.0
pulumi_kubernetes  4.17.1

Additional context

I have a feeling this happens because the response from AWS are being kept as ouputs, such as releaseVersion.

pulumi stack export | jq '.deployment.resources[] | select(.type == "aws-native:eks:Nodegroup") | .outputs | .releaseVersion,.version'
"ami-05a6b4bea29fc7541"
"1.29"

Then the provider will reuse them but should not because these are not correct values when dealing with a launch template with a custom AMI.

Another thing is that the error refers to kubernetesVersion but it's not actually a property used on the pulumi side, it's probably mapped to version.

Doing the update with the CLI is very simple and does not require recreating anything:

aws eks update-nodegroup-version  --cluster-name=$cluster_name --nodegroup-name=$nodegroup_name --launch-template name=$nodegroup_Name,version=2
aws eks update-nodegroup-version  --cluster-name=$cluster_name --nodegroup-name=$nodegroup_name --launch-template name=$nodegroup_Name,version=1

As a workaround, you can always recycle the whole Nodegroup upon change of the launch template, but that's not very clean (and much slower).
Also tried using the transforms API but did not succeed.

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

The text was updated successfully, but these errors were encountered:

t0yv0 · 2024-09-27T15:00:34Z

Indeed, looks like this error comes from EKS: https://repost.aws/knowledge-center/eks-managed-node-group-update

Your analysis is very interesting. It would really help us maintainers if you could include a small self-contained repro of the problem. I tried spending some time building a repro but it's evading me. The way the eks.Cluster is setup that's implied by your snippet is highly relevant I suspect.

btuffreau · 2024-10-01T12:22:57Z

@t0yv0 here is a repro https://github.com/btuffreau/pulumi-aws-native-1747

flostadler · 2024-10-02T09:10:55Z

Thanks for the repro @btuffreau, I'll take a look!

flostadler · 2024-10-02T12:02:20Z

Hey @btuffreau, thanks a lot for the great repro. I was able to reproduce the issue right away!

The bug stems from AWS CloudControl incorrectly including amiType, releaseVersion and version in the UpdateNodegroupVersion API call to EKS. Those parameters should be omitted when setting a custom AMI in the launch template (see AWS docs).

I've opened a bug with AWS: aws-cloudformation/cloudformation-coverage-roadmap#2151.

Until that's fixed, you could try using the eks.NodeGroup resource of the AWS classic provider as a workaround. Let me know if that works for you!

btuffreau · 2024-10-02T13:45:32Z

Thanks! I'm good with the suggested workaround for now, I'll re adjust if that becomes problematic.

btuffreau added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Sep 27, 2024

t0yv0 added needs-repro Needs repro steps before it can be triaged or fixed and removed needs-triage Needs attention from the triage team labels Sep 27, 2024

t0yv0 added needs-triage Needs attention from the triage team and removed needs-repro Needs repro steps before it can be triaged or fixed labels Oct 1, 2024

flostadler added awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). and removed needs-triage Needs attention from the triage team labels Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade Nodegroup with custom AMI fails #1747

Upgrade Nodegroup with custom AMI fails #1747

btuffreau commented Sep 27, 2024 •

edited

Loading

t0yv0 commented Sep 27, 2024

btuffreau commented Oct 1, 2024

flostadler commented Oct 2, 2024

flostadler commented Oct 2, 2024

btuffreau commented Oct 2, 2024

Upgrade Nodegroup with custom AMI fails #1747

Upgrade Nodegroup with custom AMI fails #1747

Comments

btuffreau commented Sep 27, 2024 • edited Loading

What happened?

Example

Output of pulumi about

Additional context

Contributing

t0yv0 commented Sep 27, 2024

btuffreau commented Oct 1, 2024

flostadler commented Oct 2, 2024

flostadler commented Oct 2, 2024

btuffreau commented Oct 2, 2024

btuffreau commented Sep 27, 2024 •

edited

Loading

Output of `pulumi about`