Why isn’t VRAM being released after training LoRA? #9876

hjw-0909 · 2024-11-06T11:58:59Z

Describe the bug

When I use train_dreambooth_lora_sdxl.py, the VRAM is not released after training. How can I fix this?

Reproduction

Not used.

Logs

No response

System Info

🤗 Diffusers version: 0.31.0.dev0
Platform: Linux-5.14.0-284.25.1.el9_2.x86_64-x86_64-with-glibc2.17
Running on Google Colab?: No
Python version: 3.8.20
PyTorch version (GPU?): 2.2.0 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.25.2
Transformers version: 4.45.2
Accelerate version: 1.0.1
PEFT version: 0.13.2
Bitsandbytes version: 0.44.1
Safetensors version: 0.4.5
xFormers version: not installed
Accelerator: NVIDIA H800, 81559 MiB
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

SahilCarterr · 2024-11-06T13:50:09Z

I think you should Try manual flushing the gpu memory
To see PID of the process sudo fuser -v /dev/nvidia*
Then kill the PID that you no longer need
sudo kill -9 PID.
@hjw-0909

hjw-0909 · 2024-11-06T13:54:30Z

@SahilCarterr I mean that after training, I want to perform other tasks without ending the entire Python script. In theory, VRAM should be released once train_lora.py completes the training, but it isn’t being freed.

charchit7 · 2024-11-06T14:09:14Z

As @SahilCarterr mentioned your process might be stalled.
Or try freeing the GPU mem in your code after the training loop is completed.

hjw-0909 · 2024-11-06T14:12:04Z

@charchit7 I have added torch.cuda.empty_cache() after train, but it not worked.

sayakpaul · 2024-11-07T14:56:13Z

Can you show a snap of the memory to make sure the memory is actually not released?

hjw-0909 · 2024-11-08T02:30:44Z

@sayakpaul I ensure that the memory is properly released at the end of the .py script. However, I have noticed that after training with LoRA, the memory isn't fully released.

sayakpaul · 2024-11-08T10:46:59Z

I ensure that the memory is properly released at the end of the .py script.

I don't understand what does this mean. Could you explain further?

hjw-0909 added the bug Something isn't working label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why isn’t VRAM being released after training LoRA? #9876

Why isn’t VRAM being released after training LoRA? #9876

hjw-0909 commented Nov 6, 2024

SahilCarterr commented Nov 6, 2024

hjw-0909 commented Nov 6, 2024

charchit7 commented Nov 6, 2024

hjw-0909 commented Nov 6, 2024

sayakpaul commented Nov 7, 2024

hjw-0909 commented Nov 8, 2024

sayakpaul commented Nov 8, 2024

Why isn’t VRAM being released after training LoRA? #9876

Why isn’t VRAM being released after training LoRA? #9876

Comments

hjw-0909 commented Nov 6, 2024

Describe the bug

Reproduction

Logs

System Info

Who can help?

SahilCarterr commented Nov 6, 2024

hjw-0909 commented Nov 6, 2024

charchit7 commented Nov 6, 2024

hjw-0909 commented Nov 6, 2024

sayakpaul commented Nov 7, 2024

hjw-0909 commented Nov 8, 2024

sayakpaul commented Nov 8, 2024