Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: translator reports errors for existing clusters and secretes #4707

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

zhaohuabing
Copy link
Member

@zhaohuabing zhaohuabing commented Nov 12, 2024

Fixes #4706 xDS translation failed when oidc tokenEndpoint and jwt remoteJWKS are specified within the same security policy and using the same hostname

Refactor: skips adding the cluster/secrets and returns nil to make the code cleaner and easier to maintain. It's safe to remove ErrXdsClusterExists and ErrXdsSecretsExists as they don't need to be handled in any places.

Release Notes: Yes

Test before the fix:

--- FAIL: TestTranslateXds (0.43s)
    --- FAIL: TestTranslateXds/securitypolicy-with-oidc-jwt-authz (0.00s)
        translator_test.go:142: securitypolicy-with-oidc-jwt-authz
        translator_test.go:143: 
                Error Trace:    /home/ubuntu/gateway/internal/xds/translator/translator_test.go:143
                Error:          Received unexpected error:
                                xds cluster exists
                Test:           TestTranslateXds/securitypolicy-with-oidc-jwt-authz
FAIL
FAIL    github.com/envoyproxy/gateway/internal/xds/translator   0.673s
FAIL

After:

ok      github.com/envoyproxy/gateway/internal/xds/translator   0.251s

Copy link

codecov bot commented Nov 12, 2024

Codecov Report

Attention: Patch coverage is 81.25000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 65.63%. Comparing base (c9ae045) to head (2f22fbb).

Files with missing lines Patch % Lines
internal/xds/translator/translator.go 72.72% 1 Missing and 2 partials ⚠️
internal/xds/translator/extauth.go 0.00% 0 Missing and 2 partials ⚠️
internal/gatewayapi/securitypolicy.go 94.44% 0 Missing and 1 partial ⚠️
internal/xds/translator/accesslog.go 50.00% 0 Missing and 1 partial ⚠️
internal/xds/translator/extproc.go 0.00% 0 Missing and 1 partial ⚠️
internal/xds/translator/oidc.go 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4707      +/-   ##
==========================================
+ Coverage   65.60%   65.63%   +0.03%     
==========================================
  Files         211      211              
  Lines       31998    31990       -8     
==========================================
+ Hits        20993    20998       +5     
+ Misses       9764     9756       -8     
+ Partials     1241     1236       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhaohuabing zhaohuabing marked this pull request as ready for review November 12, 2024 07:10
@zhaohuabing zhaohuabing marked this pull request as draft November 12, 2024 07:15
@zhaohuabing zhaohuabing changed the title fix: existing clusters and secretes fix: translator reports errors for existing clusters and secretes Nov 12, 2024
@zhaohuabing zhaohuabing force-pushed the fix-existing-cluster branch 6 times, most recently from 21c1901 to a7d7e6b Compare November 13, 2024 00:01
@zhaohuabing zhaohuabing marked this pull request as ready for review November 13, 2024 00:07
@arkodg
Copy link
Contributor

arkodg commented Nov 13, 2024

hey @zhaohuabing is the issue that we are generating the same name for the jwks and oidc clusters if both are set in the policy ? shouldnt we be using an additional string value to differentiate them ?

@zhaohuabing
Copy link
Member Author

hey @zhaohuabing is the issue that we are generating the same name for the jwks and oidc clusters if both are set in the policy ? shouldnt we be using an additional string value to differentiate them ?

For the cluster generated by a url, EG uses the host and port for the cluster name to avoid creating duplicated clusters for the same host+port combination.

jwt: 
 providers:
   - remoteJWKS:
       uri: https://oidc.example.com/auth/realms/example/protocol/openid-connect/cert

oidc:
 provider:
   tokenEndpoint: https://oidc.example.com/oauth/token

The name of the generated cluster: oidc_example_com_443.

We could change this logic to generate an unique name for each single oidc or jwt configuration. However, we should also ensure that the translator shouldn't throw error if the cluster already exists.

@arkodg
Copy link
Contributor

arkodg commented Nov 14, 2024

hey @zhaohuabing is the issue that we are generating the same name for the jwks and oidc clusters if both are set in the policy ? shouldnt we be using an additional string value to differentiate them ?

For the cluster generated by a url, EG uses the host and port for the cluster name to avoid creating duplicated clusters for the same host+port combination.

jwt: 
 providers:
   - remoteJWKS:
       uri: https://oidc.example.com/auth/realms/example/protocol/openid-connect/cert

oidc:
 provider:
   tokenEndpoint: https://oidc.example.com/oauth/token

The name of the generated cluster: oidc_example_com_443.

We could change this logic to generate an unique name for each single oidc or jwt configuration. However, we should also ensure that the translator shouldn't throw error if the cluster already exists.

is it safe to reuse the same cluster configuration ? is the naming different when use the backendRefs field ?

@zhaohuabing
Copy link
Member Author

is it safe to reuse the same cluster configuration ? is the naming different when use the backendRefs field ?

It's safe to reuse the asme cluster for ulr generated cluster as the cluster confiugration is identical for the same host+port combination.

For OIDC provider with backendRefs, EG generate an unique cluster name like securitypolicy/envoy-gateway/policy-for-gateway/0

func addXdsCluster(tCtx *types.ResourceVersionTable, args *xdsClusterArgs) error {
// Return early if cluster with the same name exists
if c := findXdsCluster(tCtx, args.name); c != nil {
return ErrXdsClusterExists
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not great from an API perspective for addXdsCluster, because the the old cluster with the same name may not have the same xdsClusterArgs as the current request, so the caller should decide how to handle this case, not this method imo

Copy link
Member Author

@zhaohuabing zhaohuabing Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong opinion, but should the caller call findXdsCluster first and decide whether they should handle this situation?

The current approach has an implict assumption: every callers should check whether the return error isErrXdsClusterExists or not, and ignore ErrXdsClusterExists - this cause code duplications for every caller and can cause issues if the caller forget to handle this special case, like this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, the method is doing two many things, fine to split it up, but developers will need to make sure they call both functions when writing new logic

Copy link
Member Author

@zhaohuabing zhaohuabing Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the current callers, they just need to call addXdsCluster as they just simply ignore ErrXdsClusterExists and does nothing. For them, they can all safely assume the xdsClusterArgs is the same for the clusters with the same name.

I guess this pattern comes from the Kub Client API - it totally makes sense for the Kub Client API as there is a race condition between the client and the API server. However, EG doesn't need to use the same pattern here as it's a local cache and has no concurrent writes.

@zhaohuabing
Copy link
Member Author

zhaohuabing commented Nov 14, 2024

For OIDC provider with backendRefs, EG generate an unique cluster name like securitypolicy/envoy-gateway/policy-for-gateway/0.

Ha, there is also a bug here, the index is always 0, which generates duplicated name for different clusters if there're both ext auth and oidc whithin a SecurityPolicy. Will fix it in this PR as well.

@arkodg
Copy link
Contributor

arkodg commented Nov 14, 2024

For OIDC provider with backendRefs, EG generate an unique cluster name like securitypolicy/envoy-gateway/policy-for-gateway/0.

Ha, there is also a bug here, the index is always 0, which generates duplicated name for different clusters if there're both ext auth and oidc whithin a SecurityPolicy. Will fix it in this PR as well.

nice catch, prob needs another prefix like jwt, oidc after policy name

Signed-off-by: Huabing Zhao <[email protected]>
@zhaohuabing zhaohuabing marked this pull request as draft November 14, 2024 16:57
Signed-off-by: Huabing Zhao <[email protected]>
Signed-off-by: Huabing Zhao <[email protected]>
Signed-off-by: Huabing Zhao <[email protected]>
Signed-off-by: Huabing Zhao <[email protected]>
}

var oidc *ir.OIDC
if policy.Spec.OIDC != nil {
if oidc, err = t.buildOIDC(
policy,
resources,
gtwCtx.envoyProxy); err != nil {
gtwCtx.envoyProxy, // TODO zhaohuabing: Only the last EnvoyProxy will be used as the OIDC name doesn't include the cluster index
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor issue, will address it in a follow-up PR.

@zhaohuabing zhaohuabing marked this pull request as ready for review November 14, 2024 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OIDC authentication and JWT authorization is unstable
2 participants