HeaderTooLarge when train controlnet with sdv3 #9927

Viola-Siemens · 2024-11-14T07:28:03Z

Describe the bug

Hello, I tried diffuser to train controlnet with sdv3 but it didn't start training and send safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge feedback. I don't know how to handle it.

Reproduction

Follow the README_v3 guide.

Logs

(diffusers) [liudongyu@localhost controlnet]$ accelerate launch train_controlnet_sd3.py     --pretrained_model_name_or_path=$MODEL_DIR     --output_dir=$OUTPUT_DIR     --train_data_dir="/home/users/liudongyu/datasets"     --resolution=1024     --learning_rate=1e-5     --max_train_steps=20000     --train_batch_size=1     --gradient_accumulation_steps=4
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
11/14/2024 15:16:14 - INFO - __main__ - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'max_image_seq_len', 'base_image_seq_len', 'use_dynamic_shifting', 'max_shift', 'base_shift'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
  File "/home/users/liudongyu/diffuser/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1423, in <module>
    main(args)
  File "/home/users/liudongyu/diffuser/diffusers/examples/controlnet/train_controlnet_sd3.py", line 982, in main
    text_encoder_one, text_encoder_two, text_encoder_three = load_text_encoders(
                                                             ^^^^^^^^^^^^^^^^^^^
  File "/home/users/liudongyu/diffuser/diffusers/examples/controlnet/train_controlnet_sd3.py", line 187, in load_text_encoders
    text_encoder_two = class_two.from_pretrained(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/users/liudongyu/anaconda3/envs/diffusers/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3789, in from_pretrained
    with safe_open(resolved_archive_file, framework="pt") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
Traceback (most recent call last):
  File "/home/users/liudongyu/anaconda3/envs/diffusers/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/users/liudongyu/anaconda3/envs/diffusers/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/users/liudongyu/anaconda3/envs/diffusers/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
    simple_launcher(args)
  File "/home/users/liudongyu/anaconda3/envs/diffusers/lib/python3.11/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/users/liudongyu/anaconda3/envs/diffusers/bin/python', 'train_controlnet_sd3.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-3-medium-diffusers', '--output_dir=sd3-controlnet-out', '--train_data_dir=/home/users/liudongyu/datasets', '--resolution=1024', '--learning_rate=1e-5', '--max_train_steps=20000', '--train_batch_size=1', '--gradient_accumulation_steps=4']' returned non-zero exit status 1.

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

🤗 Diffusers version: 0.31.0.dev0
Platform: Linux-3.10.0-1160.114.2.el7.x86_64-x86_64-with-glibc2.17
Running on Google Colab?: No
Python version: 3.11.10
PyTorch version (GPU?): 2.0.1+cu117 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.25.2
Transformers version: 4.45.2
Accelerate version: 1.0.0
PEFT version: not installed
Bitsandbytes version: not installed
Safetensors version: 0.4.5
xFormers version: not installed
Accelerator: NVIDIA A100-PCIE-40GB, 40960 MiB
NVIDIA A100 80GB PCIe, 81920 MiB
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

No response

The text was updated successfully, but these errors were encountered:

sayakpaul · 2024-11-14T13:39:06Z

It seems the download was probably incomplete or got corrupted. I would recommend clearing your local cache.

Viola-Siemens · 2024-11-15T06:41:00Z

It seems the download was probably incomplete or got corrupted. I would recommend clearing your local cache.

Thanks for your reply. I'll take a try.

Viola-Siemens added the bug Something isn't working label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HeaderTooLarge when train controlnet with sdv3 #9927

HeaderTooLarge when train controlnet with sdv3 #9927

Viola-Siemens commented Nov 14, 2024

sayakpaul commented Nov 14, 2024

Viola-Siemens commented Nov 15, 2024

HeaderTooLarge when train controlnet with sdv3 #9927

HeaderTooLarge when train controlnet with sdv3 #9927

Comments

Viola-Siemens commented Nov 14, 2024

Describe the bug

Reproduction

Logs

System Info

Who can help?

sayakpaul commented Nov 14, 2024

Viola-Siemens commented Nov 15, 2024