Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nova 3 cell DT #401

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jamepark4
Copy link
Contributor

@jamepark4 jamepark4 commented Sep 17, 2024

Standard deployments currently deploy cell0 and cell1. Create a DT that deploys cell0, cell1, and cell2.

Copy link

openshift-ci bot commented Sep 17, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Sep 17, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jamepark4
Once this PR has been reviewed and has the lgtm label, please assign leifmadsen for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jamepark4 jamepark4 force-pushed the add_multicell_dt branch 9 times, most recently from 364ebe9 to 38d7c4b Compare September 24, 2024 18:34
@jamepark4 jamepark4 force-pushed the add_multicell_dt branch 6 times, most recently from 19005c6 to 0f33d8c Compare October 2, 2024 13:44
config.kubernetes.io/local-config: "true"

data:
ssh_keys:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of hacing ssh_keys set in two values files in this PR. The nodeset in the library will create the same secret dataplane-ansible-ssh-private-key-secret for both nodesets. If a user looks at this they may be led to believe they can use different keys for the (I assume pre-provisioned nodes?) when using this DT - but that will not work since applying the second nodeset will modify the same secret instead of creating a new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack I can removed these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure you can remove them, if you cannot - then this change should not be blocked by this comment from me.

I was trying to point out an issue with the library not supporting multiple nodesets in a good way. Essentilly you have two nodesets referencing lib/dataplane/nodeset/kustomization.yaml#L6-L8 - so you end up applying two separate files defining the same secret resource.

@openshift-ci openshift-ci bot requested a review from raukadah November 12, 2024 20:24
@jamepark4 jamepark4 changed the title Add multi-cell DT Add nova 3 cell DT Nov 12, 2024
Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have couple of question / comments inline.

cellMessageBusInstance: rabbitmq-cell2
conductorServiceTemplate:
replicas: 3
hasAPIAccess: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note for the future. We can spice this setup up by defining cell2 without api DB access (i.e. no upcall support).

- ovn
- neutron-metadata
- libvirt
- nova
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering how this connects the nodes to cell2 if this still refers to the default nova service.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh indeed. replacing via kustomize works for me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, yeah kustomize is OK to me too.

| Role | Machine Type | Count |
| ----------------- | ------------ | ----- |
| Compact OpenShift | vm | 3 |
| OpenStack Compute | vm | 4 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this DT really deploys 4 EDPM nodes as computes? I only see edpm-compute-0 and edpm-compute-1 referenced in the NodeSets.

Copy link
Contributor Author

@jamepark4 jamepark4 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is completely virtualized its unfortunately not very clear based on purely this DT. This DT works in conjunction with templates in CIFMW [1] that splits the total computes between each nodeset. It also needs a scenario file similar to [2] to define the number of VMs needed (the one I'm using for testing this is defined downstream). Virtualized setups now have unique hostnames that are recorded during setup and that information is what [1] uses to allocate how the hosts are split. With baremetal setups where I can define the hostnames in the jobs, I can specifically define how the EDPM Hosts are allocated per nodeset within the context of the DT.

[1] https://github.com/openstack-k8s-operators/ci-framework/pull/2423/files
[2] https://github.com/openstack-k8s-operators/ci-framework/blob/main/scenarios/reproducers/va-nfv-ovs-sriov.yml

Comment on lines +8 to +9
(cell1 and cell2) in addition to cell0. Computes will be allocated to both
cells
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to describe the mapping. Which compute will be connected to which cell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

containerImageFields:
- NovaComputeImage
- EdpmIscsidImage
dataSources:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we should add that nova-cell<N>-metadata-neutron-config, if running metadata service in a cell instead of top scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack I'll try for per cell deployment.

- ovn
- neutron-metadata
- libvirt
- nova

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, use nova-<cell> in <cell> specific nodesets

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps this is also replaced via kustomize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only updating cell2 and keeping the default for cell1 so this isn't replaced, but if it improves overall clarity I can update this and kustomize to have nova-cell-1 and nova-cell-2 respectively instead.

@abays
Copy link
Contributor

abays commented Nov 18, 2024

@jamepark4 Would you mind addressing the errors reported in CI when you have a chance?

Standard deployments currently deploy cell0 and cell1. Create a DT that deploys cell0, cell1, and cell2.
Copy link
Contributor

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/dfe0fa56e83c475db9c67b866bab6145

✔️ noop SUCCESS in 0s
rhoso-architecture-validate-nova-three-cells FAILURE in 3m 30s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants