Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating requiredResources in Application Management API #280

Merged
merged 12 commits into from
Nov 4, 2024
279 changes: 256 additions & 23 deletions code/API_definitions/Edge-Application-Management.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -897,13 +897,19 @@ components:
and the value of the Edge Cloud Provider
object. This value is used to identify an Edge Cloud zone
between Edge Clouds from different Edge Cloud Providers.
required:
- edgeCloudZoneId
- edgeCloudZoneName
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is uniqueness defined here? Is edgeCloudZoneId or edgeCloudZoneName are unique or their combination is?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimpleEdgeDiscovery has:

  • edgeCloudZoneId is a UUID for the Edge Cloud Zone.
  • edgeCloudZoneName is the common name of the closest Edge Cloud Zone to
    the user device.
  • edgeCloudProvider is the name of the operator or cloud provider of
    the Edge Cloud Zone.

So, edgeCloudZoneId is expected to be unique, e.g. a namespaced URN

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to point out that if edgeCloudZoneName (or edgeCloudZoneName + edgeCloudProvider) cannot uniquely identify the zone, i.e. only the UUID can uniquely identify the zone, then we won't be able to support a declarative API. That's probably ok if the API is mainly going to be accessed via a GUI by a human, but if we want to support automation and infra-as-code via yaml files, it would be much nicer to be able to have a declarative API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the edgeCloudZoneId be the only required parameter for the edgeCloudZone?

- edgeCloudProvider
properties:
edgeCloudZoneId:
$ref: '#/components/schemas/EdgeCloudZoneId'
edgeCloudZoneName:
$ref: '#/components/schemas/EdgeCloudZoneName'
edgeCloudZoneStatus:
$ref: '#/components/schemas/EdgeCloudZoneStatus'
edgeCloudZoneFlavors:
$ref: '#/components/schemas/EdgeCloudZoneFlavors'
edgeCloudProvider:
$ref: '#/components/schemas/EdgeCloudProvider'
edgeCloudRegion:
Expand All @@ -925,6 +931,14 @@ components:
- unknown
default: unknown

EdgeCloudZoneFlavors:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another parameter "Flavor" below in yaml. How does that correlates with EdgeCloudZoneFlavors?

description: List of unique Name IDs of Infrastructure Flavors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should a flavor be visualized from a developer point of view. Is a flavor represent a virtual machine (VM) or a server node with given set of resources mapped to it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If say ir represent a single node then gpuMemory attribute alone may not be sufficient to allocate such a resource. There may be attributes like gpuCount, gpuFamily etc are also to be considered to meet the workload requirements which need GPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gunjald, given that flavors introduce complexity to the API, I'll implement a change that removes them. This will provide greater flexibility for operators to allocate workloads of any size.

type: array
items:
type: string
description: Flavor ID
example: A1.2C2M.GPU8G

Copy link
Collaborator

@nicolacdnll nicolacdnll Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to use flavors, I would suggest that each flavor have the list and spec of all resources they provide.
My take is that I don't think we can rely on the flavor id to be sufficient, or even relevant, for some infra provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I Agree Nicola, good point. I'll convert the string to object with the spec information. Thanks!

ErrorInfo:
type: object
description: Information about the error
Expand Down Expand Up @@ -961,6 +975,235 @@ components:
type: integer
description: Number of GPUs

Flavor:
type: string
description: |
Preset configuration for compute, memory, GPU,
and storage capacity. (i.e - A1.2C4M.GPU8G, A1.2C4M.GPU16G, A1.4C8M,..)
example: A1.2C2M.GPU8G
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a GET API to get a list of flavors so that user knows what the possible flavor names to use are.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @gainsley, we decided in #220 not to use flavours but if it is a better solution to implement it, GET /edge-cloud-zones should return in the response the information about the available flavour for each edge-cloud-zone of interest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tends to agree :-) May be with GET /edge-cloud-zones we can add query parameters to retrieve list of flavors and then in future we can also extend other resources via query parameters. Just a suggestion though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I'll add an entry in GET /edge-cloud-zones to report the flavors.


NodePool:
description: |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general an issue i see with this approach is that it offers too many choices to the Application Developer to ask for the compute it needs. There can be too many possibilities that the developers can provide in the API and platform needs to find out from where it can serve too many diverse combination of resources or clusters. Also, as a developer I may need to run multiple applications on same cluster so how can I express it here?
So may be compute resource creation could have a different API that could return an identifier or handle for that resource and which could be used in this proposal to indicate the resource where app can be deployed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach is to adopt a one-application-to-one-infrastructure-resource approach (VM, Kubernetes cluster, container, Docker Compose). This means we avoid managing infrastructure independently of the application.

For running multiple applications on the same Kubernetes cluster, Helm packages provide a way to bundle them together. A Helm package can contain multiple application charts, such as a database and a web application chart, effectively treating them as a single application for deployment.

This approach aligns well with node pools. Developers can leverage node pools to create clusters with a mix of nodes, such as having one with a GPU and others without, optimizing resource allocation. The application to node pool mapping is done through labels, allowing developers to reference them in Helm chart values for node affinity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be very resource heavy approach by one app to one type of infra like a k8s cluster unless we provide a way to enable in some way deploy multiple applications on one cluster. Also what will happen if cluster creation fails? That means application onboarding failed as both are now one atomic package. And another issue could be once an app along with its given infra accepted I cannot change the infra e.g. reduce or increase the resources if needed.

So I still think specially with cluster type of infra that it will be hard to implement which could mean creating a cluster dynamically which could be a very time consuming process. If we delink infra creation then there could be options like platform offline creates cluster and provide API to retrieve details of cluster ID or even provide infra creation API to manage infra for applications and use the information with the App LCM API to link them together.

Means there could be ways but otherwise in terms of approach it seems to be tightly couple the infra and applications and may reduce reusability. May be more inputs will help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gunjald,

Sounds good. I think it would be interesting to discuss the creation of an API to manage the infrastructure lifecycle (Create, Update, Delete). Enabling the Kubernetes cluster reference within the Application Management API would be easy.

For now, I think it's safe to keep things this way, allowing developers to use a Kubernetes cluster and define the minimum configuration details required by their application. We can then open a discussion about how to design a more comprehensive API for infrastructure management resources.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach is to adopt a one-application-to-one-infrastructure-resource approach (VM, Kubernetes cluster, container, Docker Compose). This means we avoid managing infrastructure independently of the application.

I think that this should be discussed further as it changes how some of use see the problem we are trying to solve.

While it makes sense for VM and containers, I'm not sure for k8s clusters. It's my understanding that operators want to use the same infra for multiple app providers/app types. In this case, packaging multiple apps in the same Helm Chart, as suggested above, cannot be done.

Set of worker nodes in a Kubernetes cluster.
type: object
required:
- flavor
- numNodes
properties:
name:
type: string
example: nodepool1
description: |
Nodepool Name (Autogenerated if not provided in the request)
flavor:
$ref: '#/components/schemas/Flavor'
numNodes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt it be something like numFlavors for better correlation?

type: integer
example: 1
description: Number of workers that compose the node pool.

K8sAddons:
description: |
Addons for the Kubernetes cluster.
Additional addons should be defined in application the helm chart
(Service Mesh, Serverless, AI).
type: object
properties:
monitoring:
type: boolean
example: true
default: false
description: Enable monitoring for Kubernetes cluster.
ingress:
type: boolean
example: true
default: false
description: Enable ingress for Kubernetes cluster.

K8sNetworking:
description: |
Kubernetes networking definition
type: object
properties:
primaryNetwork:
description: Definition of Kubernetes primary Network
type: object
properties:
provider:
description: CNI provider name
type: string
example: cilium
version:
description: CNI provider version
type: string
example: "1.13"
additionalNetworks:
description: Additional Networks for the Kubernetes cluster.
type: array
items:
type: object
description: Additional network interface definition
properties:
name:
description: Additional Network Name
type: string
example: net1
interfaceType:
description: |
Type of additional Interface:
netdevice: (SR-IOV) A regular kernel network device in the
Network Namespace (netns) of the container
vfio-pci: (SR-IOV) A PCI network interface directly mounted
in the container
interface: Additional interface to be used by cni plugins
such as macvlan, ipvlan
Note: The use of SR-IOV interfaces automatically
configure the required kernel parameters for the nodes.
type: string
example: vfio-pci
enum:
- netdevice
- vfio-pci
- interface

AdditionalStorage:
description: Additional storage for the application.
type: array
items:
type: object
required:
- storageSize
- mountPoint
properties:
name:
type: string
description: Name of additional storage resource.
example: logs
storageSize:
type: string
description: Additional persistent volume for the application.
example: 80GB
pattern: ^\d+(GB|MB)$
mountPoint:
type: string
description: Location of additional storage resource.
example: /logs

Vcpu:
type: string
pattern: ^\d+((\.\d{1,3})|(m))?$
description: |
Number of vcpus in whole (i.e 1), decimal (i.e 0.500) up to
millivcpu, or millivcpu (i.e 500m) format.
example: "500m"

KubernetesResources:
description: Definition of Kubernetes Cluster Infrastructure.
type: object
required:
- nodePools
- infraKind
properties:
infraKind:
description: Type of infrastructure for the application.
type: string
example: kubernetes
enum:
- kubernetes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The infrakind is part of the top level attribute KubernetesResources and looks redundant with value as "kubernetes" as KubernetesResources itself indicate that it is kubernetes resource.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version:
type: string
description: Minimum Kubernetes Version.
example: "1.29"
controlPlaneHa:
type: boolean
description: |
True: Enable High avaliability of Kubernetes
control plane (3 nodes)
False: Disable High avaliability of Kubernetes
control plane (1 node)
default: false
nodePools:
type: array
description: |
Description of worker node set in a Kubernetes cluster.
items:
$ref: '#/components/schemas/NodePool'
additionalStorage:
type: string
description: |
Amount of persistent storage allocated to the Kubernetes PVC.
example: 80GB
pattern: ^\d+(GB|MB)$
networking:
$ref: '#/components/schemas/K8sNetworking'
addons:
$ref: '#/components/schemas/K8sAddons'

VmResources:
description: Definition of Virtual Machine Infrastructure
type: object
required:
- flavor
- infraKind
properties:
infraKind:
description: Type of infrastructure for the application.
type: string
example: virtualMachine
enum:
- virtualMachine
flavor:
$ref: '#/components/schemas/Flavor'
additionalStorages:
$ref: '#/components/schemas/AdditionalStorage'

DockerComposeResources:
description: Definition of Docker Compose Infrastructure
type: object
required:
- flavor
- infraKind
properties:
infraKind:
description: Type of infrastructure for the application.
type: string
example: dockerCompose
enum:
- dockerCompose
flavor:
$ref: '#/components/schemas/Flavor'
additionalStorages:
$ref: '#/components/schemas/AdditionalStorage'

ContainerResources:
description: Container Infrastructure Definition
type: object
required:
- numCPU
- memory
- storage
- infraKind
properties:
infraKind:
description: Type of infrastructure for the application.
type: string
example: container
enum:
- container
numCPU:
$ref: '#/components/schemas/Vcpu'
memory:
type: integer
example: 10
description: Memory in giga bytes
storage:
$ref: '#/components/schemas/AdditionalStorage'
gpu:
type: array
description: Number of GPUs
items:
$ref: '#/components/schemas/GpuInfo'

Ipv4Addr:
type: string
format: ipv4
Expand Down Expand Up @@ -1024,33 +1267,23 @@ components:
type: integer
description: Port to stablish the connection
minimum: 0

RequiredResources:
description: |
Fundamental hardware requirements to be provisioned by the
Application Provider.
type: object
required:
- numCPU
- memory
- storage
properties:
numCPU:
type: integer
description: Number of virtual CPUs
example: 1
memory:
type: integer
example: 10
description: Memory in giga bytes
storage:
type: integer
example: 60
description: Storage in giga bytes
gpu:
type: array
description: Number of GPUs
items:
$ref: '#/components/schemas/GpuInfo'
oneOf:
- $ref: "#/components/schemas/KubernetesResources"
- $ref: "#/components/schemas/VmResources"
- $ref: "#/components/schemas/ContainerResources"
- $ref: "#/components/schemas/DockerComposeResources"
discriminator:
propertyName: infraKind
mapping:
kubernetes: "#/components/schemas/KubernetesResources"
virtualMachine: "#/components/schemas/VmResources"
container: "#/components/schemas/ContainerResources"
dockerCompose: "#/components/schemas/DockerComposeResources"

SubmittedApp:
description: Information about the submitted app
Expand Down