You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm reaching out to share a potential issue. While I've managed to resolve it on my end, others following the README's setup instructions might run into it.
Here's my setup:
OS: Ubuntu 22.04
CUDA: 11.8
After setting up via poetry as outlined in the README and running ./script/run.sh, I ran into the following error:
Traceback (most recent call last):
File "~/heron/.venv/bin/deepspeed", line 3, in<module>
from deepspeed.launcher.runner import main
File "~/heron/.venv/lib/python3.10/site-packages/deepspeed/__init__.py", line 10, in<module>
import torch
File "~/heron/.venv/lib/python3.10/site-packages/torch/__init__.py", line 229, in<module>
from torch._C import *# noqa: F403
ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory
I noticed the README mentions the expected CUDA version as 11.7, which suggests that using 11.8 might not be ideal. Given this, I reinstalled pytorch with:
This fixed the issue and ./script/run.sh ran without any hitches. I've documented this to help anyone who might face this in the future.
If it helps, I'm happy to submit a pull request updating the pyproject.toml. If this isn't the right place for such feedback, please feel free to close this issue.
Thank you.
The text was updated successfully, but these errors were encountered:
@Topology1225
Thank you for conducting the operational check and providing a detailed report! I truly appreciate you sharing such valuable insights. It would be wonderful if you could submit a pull request.
I'm reaching out to share a potential issue. While I've managed to resolve it on my end, others following the README's setup instructions might run into it.
Here's my setup:
After setting up via poetry as outlined in the README and running
./script/run.sh
, I ran into the following error:I noticed the README mentions the expected CUDA version as 11.7, which suggests that using 11.8 might not be ideal. Given this, I reinstalled pytorch with:
poetry source add torch_cu118 --priority=explicit https://download.pytorch.org/whl/cu118
This fixed the issue and
./script/run.sh
ran without any hitches. I've documented this to help anyone who might face this in the future.If it helps, I'm happy to submit a pull request updating the pyproject.toml. If this isn't the right place for such feedback, please feel free to close this issue.
Thank you.
The text was updated successfully, but these errors were encountered: