Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardizing IPv4 networking in SCS #522

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions Standards/scs-xxxx-v1-ipv4-networking-standard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: IPv4 Networking Standard
type: Standard
status: Proposal
track: IaaS
---

## Introduction

This document outlines the standardized approach for the management and allocation of
public IPv4 addresses within Sovereign Cloud Stack (SCS) environments. Its aim is to
Comment on lines +10 to +11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This document outlines the standardized approach for the management and allocation of
public IPv4 addresses within Sovereign Cloud Stack (SCS) environments. Its aim is to
This document outlines the standardized approach for the management and allocation of
public IPv4 addresses and security groups within Sovereign Cloud Stack (SCS) environments. Its aim is to

not sure how to fix this, so there are multiple problems and options here:

  • The introduction only talk about public IPv4 addresses, but later on there are specifics about floating IPs and other IPs, suggesting that the "other" IPs are non public? -> This could simply be clarified, if only public IPs are in scope of the standard.
  • The introduction makes no mention of security groups, neutron routers and neutron plugins -> these could either be mentioned explicitly here as well or be declared out of scope for this standard.
  • there are already drafts on how to formulate security groups, default security groups etc. I feel there is a large overlap here, and I think it would be good to focus the effort around security groups in a single document, and not litter many documents with possibly the same content which quickly will get out of sync. If we must - which makes sense from a security pov - we can link to a central security groups document and add context where needed. I guess @josephineSei has some opinions on this topic as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @artificial-intelligence on these points.
From my opinion we nee a strict focus on what do we want to standardize in this document. I still miss this focus here. And I would rather go for smaller standards (e.g. the standard for default security group rules will link the DR and the guide for Security Groups) so that we don't mix up to many topics.

I would see the focus on the architecture first here. So describing default networking structure and listing all required neutron plugins.

  1. mentioning other resources is not problematic, but as @artificial-intelligence said, there should be a clear line and (maybe later on) links to documents/standards/guides describing these (e.g. security groups). This would also help to keep the focus.

ensure a consistent, secure, and efficient methodology for IP address provisioning
across all SCS cloud services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by @markus-hentsch a glossary should be added here. Something like:

Term Meaning
external network Neutron Network with the external flag that is bound to an outgoing provider network
internal network Neutron Network that is created by a customers project
OVN .....
router .....
floating IP .....
.... .....

## Motivation

The motivation behind establishing this standard is to enhance interoperability, improve
security measures, and streamline the operational processes across different SCS clouds.
Comment on lines +17 to +18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good!

What I miss though, is some analysis - probably it belongs in the ## Design Considerations section - what are the current problem in these areas in real world deployments, so we have logically train of thought from "exact problem we are facing -> solution".

e.g. in which ways are current security measures not good enough, where are gaps?

Thinking about it, this could probably also be moved to a decision record, not sure though.

It addresses the need for a unified procedure in handling IPv4 networking to facilitate
ease of use for IaaS users and DevOps teams.

## Design Considerations

### Options considered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be also worthwhile to add some other options, if any where considered, and why they where not considered. It's also possible to link a decision record document, once the breakout session around this document has taken place.

It also wouldn't hurt if it was explicitly mentioned if no other options where considered and why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discusses in the IaaS call, we should start with the architecture here.

To me every other Option considered depends on the decision for or against an architecture.


#### _Neutron Routers_

Usage of Neutron Routers: To manage traffic between internal and external networks Neutron Routers **MUST** be used as the default gateway for VMs requiring access to external networks and the internet, thereby facilitating the routing of traffic and enhancing network security.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may add a line stating that routing between internal networks of the same project SHOULD be done by Neutron routers, so we officially recommend a way but do not forbid other options


CSPs **SHOULD** use OVN or L3agent as High Availability (HA) service deployments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss some reasoning why exactly I need to use OVN or L3 agent, also notice that due to some intricacies L3 agent is not really HA currently in some edge cases, like upgrades/reboots, depending on your exact setup - this technical discussion is probably out of scope for this document though.

to be clear I'm totally for using ovn, but we should write something down why we encourage it's use, what are advantages etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would help to list other options and why we would prefer OVN.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add some background on this point: this is about high availability of virtual routers, i.e. replicating virtual routers across multiple network nodes and then allowing failover via VRRP in case of a node failure, a feature that is supported by both virtual router implementations included in neutron (OVN and L3agent), but may be mutually exclusive with distributed virtual routing (DVR, the implementation of virtual routers on the compute nodes).

So this is not really an endorsement of OVN or L3agent, it is just that those are the two available implementations. There might be proprietary service plugins to replace them, but every driver that is not OVN seems to just use L3agent.

This feature is also invisible to tenants, and I'm not sure if it should be part of this specific standard. We should probably have a different discussion about where and how to mandate HA features, maybe in the context of #527.

Standard external networks **MUST NOT** be made accessible as _shared networks_. It is advised that external networks are only reachable by the usage of routing and floating IPs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a glossary at the beginning of the document would be useful. There is a lot of terminology related to networks in Neutron. For example:

Neutron seems to call a specific kind of external networks "provider networks"1 (I believe this is what the paragraph is referring to?). In some other examples, Neutron calls networks "external" although they have router:external=Internal set2 and calls other networks "public" instead. Then there's the shared attribute of networks as well, which also affects their classification depending on its setting3.

If we are enforcing things here (MUST / MUST NOT), we need to be very clear about what exactly we are referring to in my opinion. Depending on the context, "external networks" might be ambiguous. Same might go for "shared" in case it is not referring to the verbatim attribute but a topology classification.

Footnotes

  1. https://docs.openstack.org/networking-ovn/latest/admin/refarch/provider-networks.html

  2. https://docs.openstack.org/neutron/2023.2/admin/config-dns-int-ext-serv.html#use-case-3c-the-dns-extension

  3. https://opensource.com/article/17/4/openstack-neutron-networks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason, why we want to forbid this should be stated here as this led to confusion in the IaaS call

However, for special use cases like certain storage or VPN solutions it could be useful to allow _direct access networks_.

External networks and subnets **SHOULD** (very strong should) be configured with _--no-dhcp_ (DHCP - Dynamic Host Configuration Protocol). It is more secure to configure it like this, since it gives less space for reflection attacks, e.g. _Denial of Service_ (DOS) attacks. If _dhcp_ is configured, certain firewall configurations **MUST** be made to catch IPs from the _Neutron dhcp agent_ in the public network.

#### _Neutron Plugins_

Neutron Plugins: A SCS conform CSP **MAY** use RBAC and VPNaaS plugins.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should really include a general section about Neutron plugins (or extensions) here. There is a lot of them and most of them are specific to a topic1. It's hard to cover exhaustively in my opinion. The current sentence might imply to the reader that other plugins/extensions are not allowed for IPv4 networking, which I don't think is the goal. Maybe we should limit this document to giving instructions regarding those directly related to IPv4 networking and leave others open but I don't know where to draw the line to be honest.

Footnotes

  1. for example DNS plugins, see https://github.com/SovereignCloudStack/issues/issues/229#issuecomment-2018465278

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a current list of plugins exist? When I look into the code 1 I only see the ML2 Plugin. Also the documentation does not mention a plugin list (anymore). It seems to me, that most of it was removed or moved to extensions. When looking into the openstack repository on github, there are some deprecated plugins, a lot of charms-repos, some repos that could be plugins or agents and some other stuff 2.
From working in secustack I know it is possible to add custom made Plugins.

I looked into my devstack config and it does not specify any plugins, except the ML2:

[DEFAULT]
service_plugins = ovn-router
rpc_state_report_workers = 0
api_workers = 2
notify_nova_on_port_data_changes = True
notify_nova_on_port_status_changes = True
auth_strategy = keystone
debug = True
core_plugin = ml2
dhcp_agent_notification = False
transport_url = rabbit://stackrabbit:[email protected]:5672/
logging_exception_prefix = ERROR %(name)s ^[[01;35m%(instance)s^[[00m
logging_default_format_string = %(color)s%(levelname)s %(name)s [^[[00;36m-%(color)s] ^[[01;35m%(instance)s%(color)s%(message)s^[[00m
logging_context_format_string = %(color)s%(levelname)s %(name)s [^[[01;36m%(global_request_id)s %(request_id)s ^[[00;36m%(project_name)s %(user_name)s%(color)s] ^[[01;35m%(instance)s%(color>
logging_debug_format_suffix = ^[[00;33m{{(pid=%(process)d) %(funcName)s %(pathname)s:%(lineno)d}}^[[00m
bind_host = 0.0.0.0
use_syslog = False
state_path = /opt/stack/data/neutron

So all extensions seem to be usable all the time - because I was able to test the network rbac.

My conclusion: We should not state anything about plugins here, as they are poorly documented, not well maintained or even completely customized. We could discuss about letting CSPs add customized plugins. But all of these plugins while touching networking issues should not interfere with the scope of this standard. So exclude this, add a new issue and maybe we will find out enough for a new standard or maybe we just have a short note about Plugins overall on the docs-page.

Footnotes

  1. https://github.com/openstack/neutron/tree/master/neutron/plugins

  2. https://github.com/orgs/openstack/repositories?language=&q=neutron&sort=&type=all

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some background for this as well: Neutron used to include a number of monolithic core plugins, but they have all been converted to drivers for the the ML2 core plugin, as well as service plugins that build on top of that.

The available service plugins are defined in neutrons setup.cfg, and CSPs can also configure additional external plugins there. Each plugin implementation has to declare the API extensions that it provides, most of which are defined in neutron-lib. The most prominent ones are router and router-ovn which implement virtual routers and floating IPs for l3agent and ovn, respectively.

A number of API extensions are also implemented by the ML2 core plugin itself, such as subnetpools, security groups, and the different rbac extensions. As such there is no separate plugin for RBAC.
There used to be a VPNaaS plugin in neutron but it has been removed at some point, though the definition for the API extension still exists in neutron-lib.

(Neutron RBAC needs to be configured explicitly to be able to use it. If configured, Neutron configurations can be shared across OpenStack projects. It also can be beneficial for admins, since an admin could bind external networks only to certain projects.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still wonder which RBAC do you mean? Can you point me towards this please?

Because it is already confusing to differ between the keystone roles and scopes RBAC that has to be implemented by every OpenStack service and this network rbac extension of Neutron, which allows sharing of resources - that I don't recommend. Maybe we need a standard that forbids the usage and adjusts the policy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neutrons RBAC extension allows sharing of various resources among projects and is the mechanism that neutron uses to implement provider networks, so it's not really an optional feature. Rules can be created with the actions access_as_shared and access_as_external and a target_project_id that can be either an actual project id or * for sharing with all projects.
To my knowledge it does not require any explicit extra configuration, as it is part of the ML2 core plugin.

Neutrons default policies allow only the owner of a resource to share it, and allows only admin users to create rules with the wildcard * target.
This would still allow a malicius non-admin user to create a faux external network and offer it to selected target projects for whom it might not be immediately obvious that this network was not provided by the CSP (openstack network list does not even show the project id of the networks per default).
This may give the attacker the opportunity to intercept external traffic of the target project.
Preventing non-admin users from sharing networks is at least something we should consider, especially if we do not have a naming convention.


#### _Security Groups_

Security Group Policies: Standardized security group policies **SHOULD** be applied to all instances utilizing public IPv4 addresses. These policies must define and enforce access
controls to ensure the security of the cloud environment.
Security Groups **SHOULD** be enabled by default but **MUST** be capable of being switched off.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Security Groups **SHOULD** be enabled by default but **MUST** be capable of being switched off.
Security Groups **SHOULD** be enabled by default but **MUST** be capable of being switched off by allowing port security to be disabled.

Security Groups don't have an "off switch" per se, they are implicitly disabled once port security is disabled for a port or whole network a port is created in. Is this what you are referring to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markus-hentsch is correct Security Groups are part of Neutron extensions and thus cannot be switched of: https://github.com/openstack/neutron/blob/master/neutron/extensions/securitygroup.py

Security Groups are always there and being used in VMs. Even if a user does not specify anything - in that case the default VM is being used.

Nevertheless this topic does not interfere with the scope of this standard and should be included. But as we maybe want to include an architecture definition here, it would be good to reference all the work we do for security groups.

And I wonder, what do you mean with security group policies ? Do you mean the default rules for security groups? -> In that case you can just link the DR from me and maybe later on the guide and the standard:

DR: https://github.com/SovereignCloudStack/standards/blob/main/Standards/scs-0113-v1-security-groups-decision-record.md
guide and standard for the rules are still in progress.


#### _Quota & Monitoring_

Quota: The standard quota of floating IPs and routers **SHOULD** be rather small, e.g. 3-5 floating IPs. This ensures a more fair distribution of these resources for all cloud users. If a user wants to use more of these resources, the user **SHOULD** be able to pay for more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the Quota is important and belongs into this standard. We can still argue about the number :)


IP Usage Monitoring: SCS CSPs **SHOULD** implement monitoring solutions to track the utilization of IPv4 addresses. This facilitates efficient management of resources and supports capacity planning efforts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "utilization of IPv4 addresses" referring to all IPv4 addresses (incl. private ones in tenant networks) or just floating IPs? If the latter, please clarify this in the sentence.


#### _External Network Naming_

All SCS clouds **SHOULD** adopt the naming convention
scs-external-net for external networks. This standardization facilitates easier identification and management of external network resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to clarify why we would need a Naming convention at all. Renaming networks without good reasons is not really helpful.

Pre-statement: external networks can be listed with openstack network list --external.

  1. How many CSPs have more than 1 external network (for IPv4) ?
  2. How many CSPs have multiple different subnets for one external network == more than one floating-IP-pool ?
  3. Are there external networks that are only for one specific customer (i have seen something like this)?

Pro:

  • distinguish between IPv4 and IPv6 external networks

Con:

  • how to deal with multiple IPv4 external networks?
  • use other options like tags:
stack@devstack:~/devstack$ openstack network show public
+---------------------------+----------------------------------------------------------------------------+
| Field                     | Value                                                                      |
+---------------------------+----------------------------------------------------------------------------+
| admin_state_up            | UP                                                                         |
| availability_zone_hints   |                                                                            |
| availability_zones        |                                                                            |
| created_at                | 2024-01-24T16:12:31Z                                                       |
| description               |                                                                            |
| dns_domain                | None                                                                       |
| id                        | 73edb86b-d7ab-4db3-82b7-25fa8b012e40                                       |
| ipv4_address_scope        | None                                                                       |
| ipv6_address_scope        | None                                                                       |
| is_default                | True                                                                       |
| is_vlan_transparent       | None                                                                       |
| mtu                       | 1500                                                                       |
| name                      | public                                                                     |
| port_security_enabled     | True                                                                       |
| project_id                | 15f2ab0eaa5b4372b759bde609e86224                                           |
| provider:network_type     | flat                                                                       |
| provider:physical_network | public                                                                     |
| provider:segmentation_id  | None                                                                       |
| qos_policy_id             | None                                                                       |
| revision_number           | 4                                                                          |
| router:external           | External                                                                   |
| segments                  | None                                                                       |
| shared                    | False                                                                      |
| status                    | ACTIVE                                                                     |
| subnets                   | 3e0206bc-53c8-44ca-a0f1-2c2548bba766, 84dffd43-6d7f-4c2f-9180-8f0f0b83c9d4 |
| tags                      | IPv4                                                                       |
| tenant_id                 | 15f2ab0eaa5b4372b759bde609e86224                                           |
| updated_at                | 2024-03-28T09:39:03Z                                                       |
+---------------------------+----------------------------------------------------------------------------+
stack@devstack:~/devstack$ openstack network list --external --long --tag IPv4
+----------------------------+--------+--------+----------------------------+-------+--------+----------------------------+--------------+-------------+--------------------+------+
| ID                         | Name   | Status | Project                    | State | Shared | Subnets                    | Network Type | Router Type | Availability Zones | Tags |
+----------------------------+--------+--------+----------------------------+-------+--------+----------------------------+--------------+-------------+--------------------+------+
| 73edb86b-d7ab-4db3-82b7-   | public | ACTIVE | 15f2ab0eaa5b4372b759bde609 | UP    | False  | 3e0206bc-53c8-44ca-a0f1-   | flat         | External    |                    | IPv4 |
| 25fa8b012e40               |        |        | e86224                     |       |        | 2c2548bba766, 84dffd43-    |              |             |                    |      |
|                            |        |        |                            |       |        | 6d7f-4c2f-9180-            |              |             |                    |      |
|                            |        |        |                            |       |        | 8f0f0b83c9d4               |              |             |                    |      |
+----------------------------+--------+--------+----------------------------+-------+--------+----------------------------+--------------+-------------+--------------------+------+

I am pretty much for investigating those tags! Help from CSPs is wanted (we don't want to accidentally render a network not working anymore :D )


#### _Floating IPs_

Floating IPs for Dynamic Allocation: Utilization of Floating IPs to allow dynamic reassignment of public IPv4 addresses to different instances (VMs or Loadbalancers), facilitating high availability and fault tolerance.
Floating IPs **MUST** be enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again the question: Can they be disabled? OR just not set in pools?


## Open questions

- Naming Convention Flexibility: How rigid should the naming convention for external
networks be across various SCS clouds?
- Load Balancing: Do we want to dictate a Load Balancer or a set of Load Balancers or nothing at all? E.g. Octavia, Yawol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep a focus and discuss this important question in another issue.


## Decision

...

## Related Documents

Related Documents, OPTIONAL

## Conformance Tests

Conformance Tests, OPTIONAL
Loading