Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect state possible after retrying ServiceDiscoverer events #3006

Merged
merged 5 commits into from
Jul 18, 2024

Commits on Jul 12, 2024

  1. Incorrect state possible after retrying ServiceDiscoverer events

    Motivation:
    
    Clients have a configurable `serviceDiscovererRetryStrategy` to
    guarantee a steady stream of events to the `LoadBalancer` that never
    fails. It's necessary at the client level to avoid hanging requests
    indefinitely and let requests observe failures from ServiceDiscoverer.
    Also, for `PartitionedHttpClient` it's necessary to guarantee that
    `GroupedPublisher` never fails.
    
    Retry is effectively a re-subscribe. According to `ServiceDiscoverer`
    contract (clarified in apple#3002), each `Subscriber` receives a "state of
    the world" as the first collection of events. The problem is that the
    state may change significantly between retries, as a result unavailable
    addresses can remain inside the `LoadBalancer` forever. Example:
    
    T1. SD delivers [a,b]
    T1. LB receives [a,b]
    T1. SD delivers error
    T2. SD info changed ("a" got revoked)
    T3. Client retries SD
    T3. SD delivers [b]
    T3. LB receives [b] (but still holds "a")
    
    When we retry `ServiceDiscoverer` errors, we should keep pushing deltas
    downstream or purge events that are not present in the new "state of the
    world".
    
    We previously had this protection but it was mistakenly removed in apple#1949
    as part of a broader refactoring around `ServiceDiscoverer` <->
    `LoadBalancer` contract.
    
    Modifications:
    
    - Add `RetryingServiceDiscoverer` that handles retries and keeps the
    state between retries.
    - Use it in `DefaultSingleAddressHttpClientBuilder` and
    `DefaultPartitionedHttpClientBuilder`.
    - Use `CastedServiceDiscoverer` to allow modifications for
    `ServiceDiscovererEvent` after we started to use a wildcard type in
    apple#2379.
    - Pass consistent `targetResource` identifier to both
    `RetryingServiceDiscoverer` and `LoadBalancerFactory` to allow state
    correlation when inspecting heap dump.
    
    Result:
    
    Client keeps pushing deltas to `LoadBalancer` after retrying
    `ServiceDiscoverer` errors, keeping its state consistent with
    `ServiceDiscoverer`.
    idelpivnitskiy committed Jul 12, 2024
    Configuration menu
    Copy the full SHA
    d3e55ab View commit details
    Browse the repository at this point in the history

Commits on Jul 13, 2024

  1. Configuration menu
    Copy the full SHA
    0452768 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    883ce99 View commit details
    Browse the repository at this point in the history

Commits on Jul 15, 2024

  1. Configuration menu
    Copy the full SHA
    a9c3eb6 View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2024

  1. Fix noUnavailableEventsAfterCancel() test

    It should re-subscribe to the same publisher instead of calling
    `discover` one more time
    idelpivnitskiy committed Jul 18, 2024
    Configuration menu
    Copy the full SHA
    99ebe67 View commit details
    Browse the repository at this point in the history