Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

两张A100(共80G显存)测试openai_api_demo.py时报错OOM #179

Open
thu-yn opened this issue Aug 7, 2024 · 2 comments
Open

两张A100(共80G显存)测试openai_api_demo.py时报错OOM #179

thu-yn opened this issue Aug 7, 2024 · 2 comments
Assignees

Comments

@thu-yn
Copy link

thu-yn commented Aug 7, 2024

如题,已经按照cli_demo_multi_gpus.py中的多卡设置步骤将openai_api_demo.py中模型的加载做以改变:

image

然后运行openai_api_demo.py,结果报错OOM,报错截图如下:

image

需要补充的是,我的xformers库的安装是通过本地安装,采用pip install xformers-0.0.27+cu118-cp310-cp310-manylinux2014_x86_64.whl --no-deps的命令,因为我已经有了 2.3.0+cu118 的torch和 0.18.0+cu118 版本的torchvision,为了不让xFormers下载新的torch和torchvision,所以我并没有安装其他的依赖项,这是否是导致OOM的可能的原因?

@thu-yn
Copy link
Author

thu-yn commented Aug 7, 2024

为了防止可能的xFormers报错,我重新安装了torch等依赖,现在我的安装没有问题,但还是报错OOM。

我用watch -n 1 nvidia-smi命令查看GPU使用,发现当开启openai_api_demo.py后,模型加载完毕暴露本地端口后,所有的模型文件依然被加载进了第一个显卡中,但是我是按照cli_demo_multi_gpus.py中设置的两个显卡,我不清楚这是什么原因造成的。

image

下面的图中也可以看到,成功识别到了两张卡:

image

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this Aug 17, 2024
@zRzRzRzRzRzRzR
Copy link
Member

openai那个代码是单卡的,你试试按照cli_demo_multi_gpus.py中的模型载入代码逻辑换到OpenAI demo中

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants