Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(scheduler): Manual trigger envoy update #5074

Merged
merged 10 commits into from
Aug 16, 2023
Merged

fix(scheduler): Manual trigger envoy update #5074

merged 10 commits into from
Aug 16, 2023

Conversation

sakoush
Copy link
Member

@sakoush sakoush commented Aug 9, 2023

What this PR does / why we need it:

In some cases under heavy load, periodic batched envoy model update does not get a fair chance to run due to lock contention. This fix introduces a manual trigger as well (which does not require to release the lock and therefore is guaranteed to proceed).

This PR also adds:

  • increases the queue sizes of the event hub to 1000.
  • do not reset servers for models that fail scheduling but have replicas still loaded on some.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

@sakoush sakoush changed the title Manual trigger envoy update fix: Manual trigger envoy update Aug 9, 2023
@sakoush sakoush added the v2 label Aug 9, 2023
@sakoush sakoush changed the title fix: Manual trigger envoy update fix(scheduler): Manual trigger envoy update Aug 11, 2023
func (p *IncrementalProcessor) modelSync() {
logger := p.logger.WithField("func", "modelSync")
func (p *IncrementalProcessor) triggerModelSyncIfNeeded() bool {
if p.batchTriggerManual == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment on why we do this for clarity

@sakoush sakoush merged commit 085d578 into SeldonIO:v2 Aug 16, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants