You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is the bug?
We met a managed ISM index stuck at initializing status for several weeks
How can one reproduce the bug?
There's a race condition in JobScheduler#schedule and JobScheduler#deschedule, following unit test will fail(add it in JobSchedulerTests)
public void testRaceCondition() throws InterruptedException {
String indexName = ".opendistro-ism-config";
String docId = "test-doc-id";
ScheduledJobParameter jobParameter = buildScheduledJobParameter(docId, "dummy job name",
Instant.now(), Instant.now(), new IntervalSchedule(Instant.now(), 5, ChronoUnit.MINUTES), true);
ScheduledJobRunner runner = Mockito.mock(ScheduledJobRunner.class);
Scheduler.ScheduledCancellable cancellable = Mockito.mock(Scheduler.ScheduledCancellable.class);
Mockito.when(this.threadPool.schedule(Mockito.any(), Mockito.any(), Mockito.anyString())).thenReturn(cancellable);
Mockito.when(cancellable.cancel()).thenReturn(true);
for (int i = 0; i < 10000; i++) {
logger.info("start iteration {}", i);
// schedule thread
Thread scheduleThread = new Thread(() -> scheduler.schedule(indexName, docId, jobParameter, runner, dummyVersion, jitterLimit));
// deschedule thread
Thread descheduleThread = new Thread(() -> scheduler.deschedule(indexName, docId));
// start them
scheduleThread.start();
descheduleThread.start();
// wait for them to end
scheduleThread.join();
descheduleThread.join();
// deschedule again to make sure the job is removed from scheduler#scheduledJobInfo
scheduler.deschedule(indexName, docId);
// after deschedule, the scheduledJobInfo should not contains the job again
assertNull(scheduler.getScheduledJobInfo().getJobInfo(indexName, docId));
logger.info("end iteration {}", i);
}
}
On my desktop, after 1200+ iterations, the tests fails
// ... omit lot of logs
[2022-11-29T07:36:23,512][INFO ][o.o.j.s.JobSchedulerTests] [testRaceCondition] start iteration 1265
[2022-11-29T07:36:23,512][INFO ][o.o.j.s.JobScheduler ] [[Thread-2535]] Scheduling job id test-doc-id for index .opendistro-ism-config .
[2022-11-29T07:36:23,513][INFO ][o.o.j.s.JobScheduler ] [testRaceCondition] Descheduling jobId: test-doc-id
[2022-11-29T07:36:23,513][INFO ][o.o.j.s.JobSchedulerTests] [testRaceCondition] end iteration 1265
[2022-11-29T07:36:23,513][INFO ][o.o.j.s.JobSchedulerTests] [testRaceCondition] start iteration 1266
[2022-11-29T07:36:23,513][INFO ][o.o.j.s.JobScheduler ] [[Thread-2537]] Scheduling job id test-doc-id for index .opendistro-ism-config .
[2022-11-29T07:36:23,513][INFO ][o.o.j.s.JobScheduler ] [[Thread-2538]] Descheduling jobId: test-doc-id
[2022-11-29T07:36:23,514][INFO ][o.o.j.s.JobScheduler ] [[Thread-2537]] not scheduled because already removed
[2022-11-29T07:36:23,514][INFO ][o.o.j.s.JobScheduler ] [testRaceCondition] Descheduling jobId: test-doc-id
[2022-11-29T07:36:23,546][INFO ][o.o.j.s.JobSchedulerTests] [testRaceCondition] after test
REPRODUCE WITH: gradlew ':test' --tests "org.opensearch.jobscheduler.scheduler.JobSchedulerTests.testRaceCondition" -Dtests.seed=E33377BC38A3CD99 -Dtests.security.manager=false -Dtests.locale=ar-JO -Dtests.timezone=Africa/Nouakchott -Druntime.java=12
expected null, but was:<org.opensearch.jobscheduler.scheduler.JobSchedulingInfo@2a85bf7a>
java.lang.AssertionError: expected null, but was:<org.opensearch.jobscheduler.scheduler.JobSchedulingInfo@2a85bf7a>
at __randomizedtesting.SeedInfo.seed([E33377BC38A3CD99:4CEC8FE92BF01C19]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotNull(Assert.java:756)
at org.junit.Assert.assertNull(Assert.java:738)
at org.junit.Assert.assertNull(Assert.java:748)
in ISM, if user add policy to index and remove it immediately, there's a chance to trigger this bug.
What is your host/environment?
All versions include Opendistro JobScheduler and Opensearch JobScheduler has this bug.
The text was updated successfully, but these errors were encountered:
What is the bug?
We met a managed ISM index stuck at initializing status for several weeks
How can one reproduce the bug?
There's a race condition in JobScheduler#schedule and JobScheduler#deschedule, following unit test will fail(add it in JobSchedulerTests)
On my desktop, after 1200+ iterations, the tests fails
in ISM, if user add policy to index and remove it immediately, there's a chance to trigger this bug.
What is your host/environment?
The text was updated successfully, but these errors were encountered: