-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add safe request timeout for running actions manager #743
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
008ed7d
to
c345f7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: 0 of 1 LGTMs obtained
c345f7a
to
0dc8f59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: pre-commit-checks
a discussion (no related file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
lgtm, but we need a test.
Test has been added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: pre-commit-checks
0dc8f59
to
54afc90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: windows-2022 / stable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: windows-2022 / stable
nativelink-worker/tests/running_actions_manager_test.rs
line 2660 at r4 (raw file):
#[tokio::test] async fn running_actions_manager_respects_action_timeout() -> Result<(), Box<dyn std::error::Error>> { #[cfg(target_family = "unix")]
nit: This should probably be removed.
nativelink-worker/tests/running_actions_manager_test.rs
line 2698 at r4 (raw file):
) .await?; const TASK_TIMEOUT: Duration = Duration::from_secs(1);
Tests should not use wall-time. In this case, you probably want to use RunningActionsManagerImpl::new_with_timeout
or something to give test control of the timeout function, then use some signaling to facilitate this test.
2992810
to
1fbcf47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Remote / large-ubuntu-22.04, Vercel, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), pre-commit-checks, publish-image, ubuntu-20.04 / stable, ubuntu-22.04, vale, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04
nativelink-worker/tests/running_actions_manager_test.rs
line 2698 at r4 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
Tests should not use wall-time. In this case, you probably want to use
RunningActionsManagerImpl::new_with_timeout
or something to give test control of the timeout function, then use some signaling to facilitate this test.
I think the easiest way to do this without actually waiting is to set both the timeout and max_timeout to 0, at which point the checks for time elapsed will just always return true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r5, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: windows-2022 / stable
a2a06c2
to
09defc6
Compare
1fbcf47
to
6280c45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r6.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: windows-2022 / stable, and 3 discussions need to be resolved
nativelink-worker/tests/running_actions_manager_test.rs
line 2698 at r4 (raw file):
Previously, zbirenbaum (Zach Birenbaum) wrote…
I think the easiest way to do this without actually waiting is to set both the timeout and max_timeout to 0, at which point the checks for time elapsed will just always return true.
I'm not a huge fan of doing this, it makes future bugs even more difficult to write tests for and people will end up copy-pasta the pattern around.
It should be fairly trivial to hand in a custom sleep
function and allow the test to override it here, we do this in quite a few places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r6, all commit messages.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: windows-2022 / stable, and 3 discussions need to be resolved
10e5ba3
to
00a1250
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Bazel Dev / ubuntu-22.04, Publish image, Vercel, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, and 2 discussions need to be resolved
nativelink-worker/tests/running_actions_manager_test.rs
line 2660 at r4 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: This should probably be removed.
Done.
nativelink-worker/tests/running_actions_manager_test.rs
line 2698 at r4 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
I'm not a huge fan of doing this, it makes future bugs even more difficult to write tests for and people will end up copy-pasta the pattern around.
It should be fairly trivial to hand in a custom
sleep
function and allow the test to override it here, we do this in quite a few places.
I've updated it to take a custom sleep function which will immediately return if the correct timeout is passed and stall indefinitely causing the action to complete and the test to fail otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r7, 1 of 1 files at r8, all commit messages.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: windows-2022 / stable, and 2 discussions need to be resolved
nativelink-worker/tests/running_actions_manager_test.rs
line 2944 at r8 (raw file):
fs::create_dir_all(&root_work_directory).await?; // ignore the sleep and immediately timeout
nit: Caps and period.
nativelink-worker/tests/running_actions_manager_test.rs
line 2973 at r8 (raw file):
// otherwise return pending and fail the test. sleep_fn: |duration| { if duration.as_secs() == ACTION_TIMEOUT as u64 {
nit: For readability, assert that duration is ACTION_TIMEOUT
and then return ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: windows-2022 / stable, and 2 discussions need to be resolved
00a1250
to
e00911c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: Vercel, pre-commit-checks
nativelink-worker/tests/running_actions_manager_test.rs
line 2944 at r8 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: Caps and period.
Done.
nativelink-worker/tests/running_actions_manager_test.rs
line 2973 at r8 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
nit: For readability, assert that duration is
ACTION_TIMEOUT
and then return ready.
Done.
e00911c
to
3789408
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r10, all commit messages.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: windows-2022 / stable
3789408
to
66e4479
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r11, all commit messages.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: integration-tests (20.04), integration-tests (22.04), windows-2022 / stable
58c0060
to
9a81488
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r12, all commit messages.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: integration-tests (20.04), integration-tests (22.04), windows-2022 / stable, and 1 discussions need to be resolved
nativelink-worker/tests/running_actions_manager_test.rs
line 3259 at r12 (raw file):
let arguments = vec!["sh".to_string(), "-c".to_string(), "sleep 2".to_string()]; #[cfg(target_family = "windows")] let arguments = vec![
This is what is causing the test to fail. It is returning error code 1 which is incorrect function
. I have been reading up on it but can't figure it out so I'm just going to remove to remove this test for windows if that's alright @allada
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: integration-tests (20.04), integration-tests (22.04), windows-2022 / stable, and 1 discussions need to be resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: integration-tests (20.04), integration-tests (22.04), windows-2022 / stable, and 1 discussions need to be resolved
nativelink-worker/tests/running_actions_manager_test.rs
line 3259 at r12 (raw file):
Previously, zbirenbaum (Zach Birenbaum) wrote…
This is what is causing the test to fail. It is returning error code 1 which is
incorrect function
. I have been reading up on it but can't figure it out so I'm just going to remove to remove this test for windows if that's alright @allada
Why not down below set the error code on windows to expect exit code 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: integration-tests (20.04), integration-tests (22.04), windows-2022 / stable, and 1 discussions need to be resolved
nativelink-worker/tests/running_actions_manager_test.rs
line 3259 at r12 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
Why not down below set the error code on windows to expect exit code
1
?
Judging by the docs it seems like exit code 1 corresponds to something else that wouldn't actually be showing the proper behavior here. It's difficult to tell so I guess its possible that's the expected error code but I'm not certain. If not being sure is preferable to removing the test we can do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: integration-tests (20.04), integration-tests (22.04), windows-2022 / stable, and 1 discussions need to be resolved
nativelink-worker/tests/running_actions_manager_test.rs
line 3259 at r12 (raw file):
Previously, zbirenbaum (Zach Birenbaum) wrote…
Judging by the docs it seems like exit code 1 corresponds to something else that wouldn't actually be showing the proper behavior here. It's difficult to tell so I guess its possible that's the expected error code but I'm not certain. If not being sure is preferable to removing the test we can do that
This test is not actually testing the error code, it's checking that the timeout triggers. The error code is a side-effect. In this case, I would just make a TODO and keep the test for windows.
Changes running actions manager to use the action timeout if set to a nonzero value for deciding when to kill a process, otherwise defaults to its configured max_action_timeout
9a81488
to
4e06270
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Remote / large-ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), ubuntu-20.04 / stable, ubuntu-22.04 / stable, windows-2022 / stable, zig-cc ubuntu-22.04
nativelink-worker/tests/running_actions_manager_test.rs
line 3259 at r12 (raw file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
This test is not actually testing the error code, it's checking that the timeout triggers. The error code is a side-effect. In this case, I would just make a TODO and keep the test for windows.
Done.
Description
Backport of running actions manager timeout changes which checks the timeout for an action and defaults to a maximum if not set.
Type of change
Checklist
bazel test //...
passes locallygit amend
see some docsThis change is