Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add device config env for the accelerator #5396

Merged
merged 8 commits into from
Apr 21, 2024
Merged

Conversation

shiyuan680
Copy link
Contributor

@shiyuan680 shiyuan680 commented Apr 11, 2024

Thank you for pr and @delock contribution of ideas.
As mentioned in this pr, each device has its own environmental variables.
We create visible_devices_envs() and set_visible_devices_envs() methods on the accelerator class to enable each accelerator to implement env settings within the interface , which is more generic to other accelerators.

this commit has tested on npu, each one has 8 ascend npus

@shiyuan680
Copy link
Contributor Author

@microsoft-github-policy-service agree

@eigen2017
Copy link
Contributor

great thanks to HW and Microsoft,as ascend npu customer, we'll continuously provide contributions.

@CurryRice233
Copy link
Contributor

great thanks to HW and Microsoft,as ascend npu customer, we'll continuously provide contributions.

Thanks to you too :)

@CurryRice233
Copy link
Contributor

CurryRice233 commented Apr 11, 2024

@delock @nelyahu @mrwyattii we added an accelerator method to get the env of the visible device, and we don't know the env of other accelerators. Can you provide it?

@delock
Copy link
Collaborator

delock commented Apr 11, 2024

For XPU, the name of env is ZE_AFFINITY_MASK.

I think it would be better to define interface as set_visible_devices(list) and let each accelerator implement setting of env inside the interface. This will give each accelerator flexibility to set visible devices in different ways.

@BacharL
Copy link
Collaborator

BacharL commented Apr 11, 2024

For HPU HABANA_VISIBLE_MODULES

I think @delock proposed a good approach

@tjruwase
Copy link
Contributor

I think it would be better to define interface as set_visible_devices(list) and let each accelerator implement setting of env inside the interface. This will give each accelerator flexibility to set visible devices in different ways.

@shiyuan680, thanks for creating this PR. Can you please add this interface suggested by @delock?

@delock and @BacharL, thanks for assisting with this PR.

@shiyuan680 shiyuan680 changed the title fix the accelerator issue add device config env for the accelerator Apr 12, 2024
Co-authored-by: eigen2017 <[email protected]>
@CurryRice233
Copy link
Contributor

@loadams CI has passed, would you please help to review this PR and add to merge queue ?

@tjruwase
Copy link
Contributor

@loadams CI has passed, would you please help to review this PR and add to merge queue ?

@CurryRice233, can you help to address my questions? Thanks!

@eigen2017
Copy link
Contributor

@loadams CI has passed, would you please help to review this PR and add to merge queue ?

@CurryRice233, can you help to address my questions? Thanks!

yes it is unknown that how to set cpu and mps env,both setting CUDA_VISIBLE_DEVICES, i suppose it is means to keep the same logic to the old version of deepspeed for mps and cpu.

so is it better to add annotations over the code to explain why set CUDA_VISIBLE_DEVICES ?

@tjruwase
Copy link
Contributor

so is it better to add annotations over the code to explain why set CUDA_VISIBLE_DEVICES ?

@eigen2017, yes annotation is reasonable for this. Thanks for your explanation.

@eigen2017
Copy link
Contributor

so is it better to add annotations over the code to explain why set CUDA_VISIBLE_DEVICES ?

@eigen2017, yes annotation is reasonable for this. Thanks for your explanation.

i don't have write access to microsoft/DeepSpeed, could anyone help to add new commits to branch [shiyuan680:master], and add annotations to [mps_accelerator.py] and [cpu_accelerator.py] ? @tjruwase @CurryRice233 @loadams @shiyuan680

explainations can like:
"TODO: before mps' and cpu's visible envs are confirmed, keep the same old env setting as CUDA_VISIBLE_DEVICES "

@tjruwase tjruwase added this pull request to the merge queue Apr 20, 2024
Merged via the queue into microsoft:master with commit 3f875d9 Apr 21, 2024
15 checks passed
@eigen2017
Copy link
Contributor

@tjruwase thank you for helping us~~thank you~

@CurryRice233
Copy link
Contributor

@eigen2017 thank you for your explanation, and @tjruwase thank you for the annotation. :)

rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024
Thank you for [pr](microsoft#5369) and
@delock contribution of ideas.
As mentioned in this
[pr](microsoft#5369), each device has
its own environmental variables.
We create visible_devices_envs() and set_visible_devices_envs() methods
on the accelerator class to enable each accelerator to implement env
settings within the interface , which is more generic to other
accelerators.

this commit has tested on npu, each one has 8 ascend npus

---------

Co-authored-by: yangcheng <[email protected]>
Co-authored-by: eigen2017 <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
umchand pushed a commit to umchand/DeepSpeed that referenced this pull request May 20, 2024
Thank you for [pr](microsoft#5369) and
@delock contribution of ideas.
As mentioned in this
[pr](microsoft#5369), each device has
its own environmental variables.
We create visible_devices_envs() and set_visible_devices_envs() methods
on the accelerator class to enable each accelerator to implement env
settings within the interface , which is more generic to other
accelerators.

this commit has tested on npu, each one has 8 ascend npus

---------

Co-authored-by: yangcheng <[email protected]>
Co-authored-by: eigen2017 <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
dbyoung18 pushed a commit to dbyoung18/DeepSpeed that referenced this pull request Jun 11, 2024
Thank you for [pr](microsoft#5369) and
@delock contribution of ideas.
As mentioned in this
[pr](microsoft#5369), each device has
its own environmental variables.
We create visible_devices_envs() and set_visible_devices_envs() methods
on the accelerator class to enable each accelerator to implement env
settings within the interface , which is more generic to other
accelerators.

this commit has tested on npu, each one has 8 ascend npus

---------

Co-authored-by: yangcheng <[email protected]>
Co-authored-by: eigen2017 <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants