Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Nodegroup with custom AMI fails #1747

Open
btuffreau opened this issue Sep 27, 2024 · 5 comments
Open

Upgrade Nodegroup with custom AMI fails #1747

btuffreau opened this issue Sep 27, 2024 · 5 comments
Labels
awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). kind/bug Some behavior is incorrect or out of spec

Comments

@btuffreau
Copy link

btuffreau commented Sep 27, 2024

What happened?

I'm using a launch template with a custom ami, the initial provisioning works fine, but whenever I pass a new ami id to the launch template the nodegroup update fails like so:

error: operation UPDATE failed with "InvalidRequest": You cannot specify the field kubernetesVersion when using custom AMIs. (Service: Eks, Status Code: 400, Request ID: 271bf7a2-653e-4a86-8646-7c2b279cbd64)

Example

eks_nodes_launch_template = LaunchTemplate(resource_name="eks-launch-template",
                                           args=LaunchTemplateArgs(
                                               launch_template_name="node-template",
                                               launch_template_data=LaunchTemplateDataArgs(
                                                   instance_type="m5.2xlarge",
                                                   # image_id=ami.id,
                                                   image_id="ami-05a6b4bea29fc7541", # bottlerocket
                                                   # image_id="ami-0660a492772131f50",
                                                   security_group_ids=[eks.cluster_security_group_id],
                                               )))

Nodegroup(resource_name=f"{args.cluster_name}-managed-nodes",
          args=NodegroupArgs(cluster_name=eks.name,
                             capacity_type="ON_DEMAND",
                             release_version=None,
                             version=None,
                             nodegroup_name=f"{args.cluster_name}-managed-nodes",
                             node_role=eks_nodes_role.arn,
                             launch_template=NodegroupLaunchTemplateSpecificationArgs(
                                 name=eks_nodes_launch_template.launch_template_name,
                                 version=eks_nodes_launch_template.latest_version_number
                             ),
                             subnets=[subnet.subnet_id for subnet in private_subnets],
                             scaling_config=NodegroupScalingConfigArgs(desired_size=2,
                                                                       min_size=2,
                                                                       max_size=3)), )

I stripped my original code but I hope it's obvious enough, just switching the image_id from the LaunchTemplate would cause this issue

Output of pulumi about

CLI          
Version      3.134.0
Go Version   go1.23.1
Go Compiler  gc

Plugins
KIND      NAME        VERSION
resource  aws         6.49.1
resource  aws-native  0.119.0
resource  kubernetes  4.17.1
language  python      unknown

Host     
OS       darwin
Version  14.5
Arch     arm64
Dependencies:
NAME               VERSION
aws                0.2.5
mypy               1.11.1
pip                24.0
pulumi_aws         6.49.1
pulumi_aws_native  0.119.0
pulumi_kubernetes  4.17.1

Additional context

I have a feeling this happens because the response from AWS are being kept as ouputs, such as releaseVersion.

pulumi stack export | jq '.deployment.resources[] | select(.type == "aws-native:eks:Nodegroup") | .outputs | .releaseVersion,.version'
"ami-05a6b4bea29fc7541"
"1.29"

Then the provider will reuse them but should not because these are not correct values when dealing with a launch template with a custom AMI.

Another thing is that the error refers to kubernetesVersion but it's not actually a property used on the pulumi side, it's probably mapped to version.

Doing the update with the CLI is very simple and does not require recreating anything:

aws eks update-nodegroup-version  --cluster-name=$cluster_name --nodegroup-name=$nodegroup_name --launch-template name=$nodegroup_Name,version=2
aws eks update-nodegroup-version  --cluster-name=$cluster_name --nodegroup-name=$nodegroup_name --launch-template name=$nodegroup_Name,version=1

As a workaround, you can always recycle the whole Nodegroup upon change of the launch template, but that's not very clean (and much slower).
Also tried using the transforms API but did not succeed.

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@btuffreau btuffreau added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Sep 27, 2024
@t0yv0
Copy link
Member

t0yv0 commented Sep 27, 2024

Indeed, looks like this error comes from EKS: https://repost.aws/knowledge-center/eks-managed-node-group-update

Your analysis is very interesting. It would really help us maintainers if you could include a small self-contained repro of the problem. I tried spending some time building a repro but it's evading me. The way the eks.Cluster is setup that's implied by your snippet is highly relevant I suspect.

@t0yv0 t0yv0 added needs-repro Needs repro steps before it can be triaged or fixed and removed needs-triage Needs attention from the triage team labels Sep 27, 2024
@btuffreau
Copy link
Author

@t0yv0 here is a repro https://github.com/btuffreau/pulumi-aws-native-1747

@t0yv0 t0yv0 added needs-triage Needs attention from the triage team and removed needs-repro Needs repro steps before it can be triaged or fixed labels Oct 1, 2024
@flostadler
Copy link
Contributor

Thanks for the repro @btuffreau, I'll take a look!

@flostadler
Copy link
Contributor

Hey @btuffreau, thanks a lot for the great repro. I was able to reproduce the issue right away!

The bug stems from AWS CloudControl incorrectly including amiType, releaseVersion and version in the UpdateNodegroupVersion API call to EKS. Those parameters should be omitted when setting a custom AMI in the launch template (see AWS docs).

I've opened a bug with AWS: aws-cloudformation/cloudformation-coverage-roadmap#2151.

Until that's fixed, you could try using the eks.NodeGroup resource of the AWS classic provider as a workaround. Let me know if that works for you!

@flostadler flostadler added awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). and removed needs-triage Needs attention from the triage team labels Oct 2, 2024
@btuffreau
Copy link
Author

Thanks! I'm good with the suggested workaround for now, I'll re adjust if that becomes problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). kind/bug Some behavior is incorrect or out of spec
Projects
None yet
Development

No branches or pull requests

3 participants