Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Un-pin torch version in nv-torch-latest back to latest and skip test_compile_zero tests on v100 #5459

Merged
merged 5 commits into from
Apr 30, 2024

Conversation

loadams
Copy link
Contributor

@loadams loadams commented Apr 23, 2024

Torch updating to 2.3.0 broke some test_compile_zero tests, we pinned it, @tohtana pushed fixes in #5463, this should un-pin and move us back to the latest.

Failing test that indicates the generated code cannot run bf16 on V100 here.

@loadams loadams requested a review from mrwyattii as a code owner April 23, 2024 23:50
github-merge-queue bot pushed a commit that referenced this pull request Apr 25, 2024
PyTorch v2.3 throws an error when it tries to compile `iter_params` used
for ZeRO3.
This PR excludes the function from the compilation targets.

After this PR is merged, we can [unpin the torch version for unit
tests](#5459).
@loadams loadams changed the title Un-pin torch version in nv-torch-latest back to latest Un-pin torch version in nv-torch-latest back to latest and skip test_compile_zero tests on v100 Apr 26, 2024
@loadams loadams added this pull request to the merge queue Apr 29, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 29, 2024
@loadams loadams enabled auto-merge April 29, 2024 22:13
@loadams loadams added this pull request to the merge queue Apr 29, 2024
Merged via the queue into master with commit f32ad3e Apr 30, 2024
12 checks passed
umchand pushed a commit to umchand/DeepSpeed that referenced this pull request May 20, 2024
PyTorch v2.3 throws an error when it tries to compile `iter_params` used
for ZeRO3.
This PR excludes the function from the compilation targets.

After this PR is merged, we can [unpin the torch version for unit
tests](microsoft#5459).
umchand pushed a commit to umchand/DeepSpeed that referenced this pull request May 20, 2024
…compile_zero tests on v100 (microsoft#5459)

Torch updating to 2.3.0 broke some test_compile_zero tests, we pinned
it, @tohtana pushed fixes in microsoft#5463, this should un-pin and move us back
to the latest.

Failing test that indicates the generated code cannot run bf16 on V100
[here](https://github.com/microsoft/DeepSpeed/actions/runs/8838672379/job/24270349996?pr=5459#step:8:5157).
dbyoung18 pushed a commit to dbyoung18/DeepSpeed that referenced this pull request Jun 11, 2024
PyTorch v2.3 throws an error when it tries to compile `iter_params` used
for ZeRO3.
This PR excludes the function from the compilation targets.

After this PR is merged, we can [unpin the torch version for unit
tests](microsoft#5459).
dbyoung18 pushed a commit to dbyoung18/DeepSpeed that referenced this pull request Jun 11, 2024
…compile_zero tests on v100 (microsoft#5459)

Torch updating to 2.3.0 broke some test_compile_zero tests, we pinned
it, @tohtana pushed fixes in microsoft#5463, this should un-pin and move us back
to the latest.

Failing test that indicates the generated code cannot run bf16 on V100
[here](https://github.com/microsoft/DeepSpeed/actions/runs/8838672379/job/24270349996?pr=5459#step:8:5157).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants