Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-1.15] Improve scheduler memory usage and remove scheduler waits to speed up recovery time #8202

Merged

Conversation

pierDipi
Copy link
Member

* Improve scheduler memory usage

- Create a namespaced-scoped statefulset lister instead of being
  cluster-wide
- Accept a PodLister rather than creating a cluster-wide one

Signed-off-by: Pierangelo Di Pilato <[email protected]>

* Update codegen

Signed-off-by: Pierangelo Di Pilato <[email protected]>

---------

Signed-off-by: Pierangelo Di Pilato <[email protected]>
Currently, the scheduler and autoscaler are single threads and use
a lock to prevent multiple scheduling and autoscaling decision
from happening in parallel; this is not a problem for our use
cases, however, the multiple `wait` currently present are slowing
down recovery time.

From my testing, if I delete and recreate the Kafka control plane
and data plane, without this patch it takes 1h to recover when there
are 400 triggers or 20 minutes when there are 100 triggers; with the
patch it is immediate (only a 2/3 minutes with 400 triggers).

- Remove `wait`s from state builder and autoscaler
- Add additional debug logs
- Use logger provided through the context as opposed to gloabal loggers
  in each individual component to preserve `knative/pkg` resource aware
  log keys.

Signed-off-by: Pierangelo Di Pilato <[email protected]>
@knative-prow knative-prow bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 23, 2024
Copy link
Member

@matzew matzew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Sep 23, 2024
Copy link

knative-prow bot commented Sep 23, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: matzew, pierDipi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

codecov bot commented Sep 23, 2024

Codecov Report

Attention: Patch coverage is 68.49315% with 46 lines in your changes missing coverage. Please review.

Project coverage is 67.81%. Comparing base (f969763) to head (e3f4cf1).
Report is 1 commits behind head on release-1.15.

Files with missing lines Patch % Lines
pkg/scheduler/statefulset/autoscaler.go 48.83% 19 Missing and 3 partials ⚠️
pkg/scheduler/state/state.go 56.25% 9 Missing and 5 partials ⚠️
pkg/scheduler/statefulset/scheduler.go 87.87% 8 Missing ⚠️
pkg/scheduler/state/helpers.go 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@               Coverage Diff                @@
##           release-1.15    #8202      +/-   ##
================================================
- Coverage         67.88%   67.81%   -0.08%     
================================================
  Files               368      368              
  Lines             17565    17587      +22     
================================================
+ Hits              11924    11926       +2     
- Misses             4893     4907      +14     
- Partials            748      754       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants