-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
While generating a 10-second video with CogVideoX-1.5, encountered (OOM) on machine with 80 GB of GPU memory. #493
Comments
And per 10-second video generation costs 40 min. Is this a reasonable duration in your experience? |
I also tried to add ############################## Sampling setting ##############################
Sampler: VPSDEDPMPP2MSampler
Discretization: ZeroSNRDDPMDiscretization
Guider: DynamicCFG
Sampling with VPSDEDPMPP2MSampler for 51 steps: 98%|█████████▊| 50/51 [44:23<00:53, 53.27s/it]
/mnt/petrelfs/dongziyue/anaconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/conv.py:720: UserWarning: cuDNN cannot be used for large non-batch-splittable convolutions if the V8 API is not enabled or before cuDNN version 9.3+. Consider upgrading cuDNN and/or enabling the V8 API for better efficiency. (Triggered internally at ../aten/src/ATen/native/Convolution.cpp:430.)
return F.conv3d(
0it [44:42, ?it/s]
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/petrelfs/dongziyue/video/CogVideo/sat/sample_video_ori.py", line 262, in <module>
[rank0]: sampling_main(args, model_cls=SATVideoDiffusionEngine)
[rank0]: File "/mnt/petrelfs/dongziyue/video/CogVideo/sat/sample_video_ori.py", line 236, in sampling_main
[rank0]: samples_x = model.decode_first_stage(samples_z).to(torch.float32)
[rank0]: File "/mnt/petrelfs/dongziyue/anaconda3/envs/cogvideo/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/mnt/petrelfs/dongziyue/video/CogVideo/sat/diffusion_video.py", line 198, in decode_first_stage
[rank0]: recon = self.first_stage_model.decode(
[rank0]: File "/mnt/petrelfs/dongziyue/video/CogVideo/sat/vae_modules/autoencoder.py", line 620, in decode
[rank0]: x = super().decode(z, use_cp=use_cp, **kwargs)
[rank0]: File "/mnt/petrelfs/dongziyue/video/CogVideo/sat/vae_modules/autoencoder.py", line 214, in decode
[rank0]: x = self.decoder(z, **kwargs)
[rank0]: File "/mnt/petrelfs/dongziyue/anaconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/mnt/petrelfs/dongziyue/anaconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/mnt/petrelfs/dongziyue/video/CogVideo/sat/vae_modules/cp_enc_dec.py", line 960, in forward
[rank0]: h = self.up[i_level].block[i_block](
[rank0]: File "/mnt/petrelfs/dongziyue/anaconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/mnt/petrelfs/dongziyue/anaconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/mnt/petrelfs/dongziyue/video/CogVideo/sat/vae_modules/cp_enc_dec.py", line 676, in forward
[rank0]: h = self.norm1(h, zq, clear_fake_cp_cache=clear_fake_cp_cache, fake_cp=fake_cp)
[rank0]: File "/mnt/petrelfs/dongziyue/anaconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/mnt/petrelfs/dongziyue/anaconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/mnt/petrelfs/dongziyue/video/CogVideo/sat/vae_modules/cp_enc_dec.py", line 482, in forward
[rank0]: new_f = norm_f * self.conv_y(zq) + self.conv_b(zq)
[rank0]: RuntimeError: CUDA driver error: invalid argument
[rank0]:[W1113 00:22:35.424437131 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) |
This is normal, it indeed takes 40 minutes to generate a 10-second video, we will look into this issue(OOM) shortly and release a version of diffusers with significantly reduced memory usage. However, this time (40 minutes) is normal. |
@zRzRzRzRzRzRzR @DZY-irene @zigchang any update about this? I still got OOM error |
We are aware of this issue, but I have been working on the diffusers version recently. You can check the latest PR; we are in the final stage, and the diffusers version only requires a minimum of 9GB of GPU memory. |
System Info / 系統信息
Information / 问题信息
Reproduction / 复现过程
I only changed the model path in
sat/configs/cogvideox1.5_5b.yaml
andsat/configs/inference.yaml
according to the SAT workflow and performed inference on a single 80G GPU on 10-second video.but I got OOM when
The OOM info is here:
Expected behavior / 期待表现
Find the possible causes and ensure the 10-second inference goes well.
The text was updated successfully, but these errors were encountered: