Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporter should be in error state instead of blocked if installation fails #203

Open
dashmage opened this issue Mar 26, 2024 · 5 comments
Open
Labels
enhancement New feature or request

Comments

@dashmage
Copy link
Contributor

The exporter is installed as part of the charm's install event handler. Currently, if the installation of the exporter fails, we put the charm into blocked status. But ideally, since the user cannot do anything to recover from the failure, we should let the error bubble up and have the charm be in error status.

@Pjack
Copy link

Pjack commented Mar 26, 2024

However, another thing showed up to me while reading the _on_install_or_upgrade. If it fails to install the charm is setting to blocked state, and this does not look right to me. Blocked means that the user should manually do something. What the user should do if the installation fails? IMO, we should raise an exception and let to be on error state.

If installation fails due to network connectivity issues, the operator is responsible for addressing the network problem, as it's not a bug within the charm itself. Therefore, setting the status to "blocked" is reasonable in this scenario. In summary, any error state indicates a potential bug.

Similar discussion here
https://discourse.charmhub.io/t/its-probably-ok-for-a-unit-to-go-into-error-state/13022/24
image

My understanding come from these two documents.

error : The unit is in error, likely from a hook failure. The operator will need to file a bug
blocked : The charm is stuck. Human attention/intervention is required.

Once the charm enter error state, it's hard to recover from it. I tend to avoid it if possible, especially when I don't think it's a bug.

@chanchiwai-ray
Copy link
Contributor

However, another thing showed up to me while reading the _on_install_or_upgrade. If it fails to install the charm is setting to blocked state, and this does not look right to me. Blocked means that the user should manually do something. What the user should do if the installation fails? IMO, we should raise an exception and let to be on error state.

If installation fails due to network connectivity issues, the operator is responsible for addressing the network problem, as it's not a bug within the charm itself. Therefore, setting the status to "blocked" is reasonable in this scenario. In summary, any error state indicates a potential bug.

Yes, that's not a bug within the charm itself, so if the user file a bug, we should explain that it's not a bug and potentially offer a way to avoid it. A blocked status in this case is a dead end, install hook will not run again even when operator fixed the issue; on the other hand, an error state can be recovered: juju resolved --all will retry the failed hook

@gabrielcocenza
Copy link
Member

I agree with @chanchiwai-ray . The install hook is one shot and if we block it and return, there is no way to run it again. If a similar issue was on config-change or update-status, then blocking the unit is totally fine

@Pjack Pjack added the enhancement New feature or request label Apr 3, 2024
@aieri
Copy link
Contributor

aieri commented Jul 11, 2024

is it possible to defer an install hook? I agree that if the installation fails for external reasons it's not a charm bug, hence we should not produce a stack trace. On the other hand, we should retry automatically and notify the user of why installation did not complete.

@Pjack
Copy link

Pjack commented Oct 1, 2024

After the discussion in the OpenStack Exporter Operator project, I believe we now have clear guidelines and know how to proceed.
https://github.com/canonical/openstack-exporter-operator/pull/110/files
canonical/openstack-exporter-operator#100 (comment)

Please modify the behavior: Raise the error and let operator framework handle it.
Here is the reference code
https://github.com/canonical/openstack-exporter-operator/blob/main/src/charm.py#L148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants