-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-1.15] Improve scheduler memory usage and remove scheduler waits to speed up recovery time #8202
[release-1.15] Improve scheduler memory usage and remove scheduler waits to speed up recovery time #8202
Conversation
* Improve scheduler memory usage - Create a namespaced-scoped statefulset lister instead of being cluster-wide - Accept a PodLister rather than creating a cluster-wide one Signed-off-by: Pierangelo Di Pilato <[email protected]> * Update codegen Signed-off-by: Pierangelo Di Pilato <[email protected]> --------- Signed-off-by: Pierangelo Di Pilato <[email protected]>
Currently, the scheduler and autoscaler are single threads and use a lock to prevent multiple scheduling and autoscaling decision from happening in parallel; this is not a problem for our use cases, however, the multiple `wait` currently present are slowing down recovery time. From my testing, if I delete and recreate the Kafka control plane and data plane, without this patch it takes 1h to recover when there are 400 triggers or 20 minutes when there are 100 triggers; with the patch it is immediate (only a 2/3 minutes with 400 triggers). - Remove `wait`s from state builder and autoscaler - Add additional debug logs - Use logger provided through the context as opposed to gloabal loggers in each individual component to preserve `knative/pkg` resource aware log keys. Signed-off-by: Pierangelo Di Pilato <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: matzew, pierDipi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release-1.15 #8202 +/- ##
================================================
- Coverage 67.88% 67.81% -0.08%
================================================
Files 368 368
Lines 17565 17587 +22
================================================
+ Hits 11924 11926 +2
- Misses 4893 4907 +14
- Partials 748 754 +6 ☔ View full report in Codecov by Sentry. |
Cherry-pick of
wait
s to speed up recovery time #8200