-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added SmartSwitch support in chassisd and enabling chassisd #467
base: master
Are you sure you want to change the base?
Conversation
d5ed9fc
to
0785bb1
Compare
platform identifier.
temp fix to get the build passing.
What are the states supported by the DPUs in the Smart Switch? |
”dpu_midplane_link_state” |
accidental removal of a few lines from the original chassisd file
sonic-chassisd/scripts/chassisd
Outdated
@@ -184,7 +188,7 @@ class ModuleUpdater(logger.Logger): | |||
self.chassis = chassis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that instead of modifying the existing ModuleUpdater
class, we should implement SmartSwitchModuleUpdater(ModuleUpdater)
. SmartSwitchModuleUpdater
should be derived from ModuleUpdater
and overwrite the methods that should behave differently for the Smart Switch. This approach allows us to keep the original implementation untouched and guarantees full backward compatibility with the chassisd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will consider this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as commented
sonic-chassisd/scripts/chassisd
Outdated
|
||
# Initialize DPU_STATE DB table on bootup | ||
dpu_state_key = "DPU_STATE|" + module_name | ||
self.module_updater.update_dpu_state(dpu_state_key, admin_state.upper()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we update the midplane state with the configured admin_state
instead of checking the reachability of the midplane here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to query the operation state of the interface connected to the DPU. And if the oper state is up, the we should test the reachability of IP address of the interface on the DPU side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the initialization of the system we want to make sure the system comes up correctly with the configured state, which is the admin_state. During boot up the operational state of the DPUs could be anything. So,
- Get the admin_state and initialize the DPUs to that state
- Set the DPU_STATE table as per the admins state
- Soon the periodic update of the DPU_STATE will kick in which will check for the operational state of the DPU and update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming before we call the set_initial_dpu_admin_state
the DPUs are powered off but the CONFIG_DB entry is set to admin_state=up
we are setting the dpu_midplane_link_state=up
in the CHASSIS_STATE_DB before we even power on the DPUs (which is done in the submit_dpu_callback
later in this function), if the DPU power on fails or if we do not achieve midplane reachability we have the wrong data in the CHASSIS_STATE_DB
sonic-chassisd/scripts/chassisd
Outdated
self.update_dpu_state(key, "UP") | ||
|
||
elif midplane_access is False and current_midplane_state == 'False': | ||
if self.is_module_reboot_system_up_expired(module_key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this function call?
sonic-chassisd/scripts/chassisd
Outdated
midplane_state = 'False' | ||
if isinstance(fvs, list) and fvs[0] is True: | ||
fvs = dict(fvs[-1]) | ||
midplane_state = fvs[CHASSIS_MIDPLANE_INFO_ACCESS_FIELD] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking for the midplane_state
here in the DB (and if it is not present in the DB we are assuming that the module is not reachable since midplane_state=False
by default) but in case of Nvidia platform the midplane_state
is being updated to the DB by Chassisd itself, so chassisd here is checking for midplane state in DB before it is supposed to update the midplane state itself. Either we need to re order code so that we update midplane_state to db before we check the DB for values, or instead of checking the DB we can directly invoke the midplane reachability API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this may cause issues, assuming the dpu is powered on initially, by default if there is no data in DB we assume the midplane_state is false, if no config is present the DPU is supposed to power off, but admin_state == 'down' and midplane_state is false, so the operation is skipped, causing DPU to stay on
Added SmartSwitch support in chassisd and enabling chassisd for fixed SmartSwitches
Description
chassisd is enabled only for modular chassis. Smartswitch is a fixed chassis. However it has been treated like a modular chassis to manage the DPU cards just like the line-cards of a modular chassis. chassisd will be enabled only on the smartswitch NPU and hence the scope of these changes are limited only to NPU and not applicable to DPU.
Motivation and Context
chassisd is enabled only for modular chassis. Smartswitch is a fixed chassis. However it has been treated like a modular chassis to manage the DPU cards just like the line-cards of a modular chassis. Hence, chassid is needed for SmartSwitch and enabled here. Also, some of the table updates and clean up are not required for SmartSwitch platform and hence using the is_smartswitch API to selectively enable it. the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here
How Has This Been Tested?
Additional Information (Optional)