Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HeaderTooLarge when train controlnet with sdv3 #9927

Open
Viola-Siemens opened this issue Nov 14, 2024 · 2 comments
Open

HeaderTooLarge when train controlnet with sdv3 #9927

Viola-Siemens opened this issue Nov 14, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@Viola-Siemens
Copy link

Describe the bug

Hello, I tried diffuser to train controlnet with sdv3 but it didn't start training and send safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge feedback. I don't know how to handle it.

Reproduction

Follow the README_v3 guide.

Logs

(diffusers) [liudongyu@localhost controlnet]$ accelerate launch train_controlnet_sd3.py     --pretrained_model_name_or_path=$MODEL_DIR     --output_dir=$OUTPUT_DIR     --train_data_dir="/home/users/liudongyu/datasets"     --resolution=1024     --learning_rate=1e-5     --max_train_steps=20000     --train_batch_size=1     --gradient_accumulation_steps=4
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
11/14/2024 15:16:14 - INFO - __main__ - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'max_image_seq_len', 'base_image_seq_len', 'use_dynamic_shifting', 'max_shift', 'base_shift'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
  File "/home/users/liudongyu/diffuser/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1423, in <module>
    main(args)
  File "/home/users/liudongyu/diffuser/diffusers/examples/controlnet/train_controlnet_sd3.py", line 982, in main
    text_encoder_one, text_encoder_two, text_encoder_three = load_text_encoders(
                                                             ^^^^^^^^^^^^^^^^^^^
  File "/home/users/liudongyu/diffuser/diffusers/examples/controlnet/train_controlnet_sd3.py", line 187, in load_text_encoders
    text_encoder_two = class_two.from_pretrained(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/users/liudongyu/anaconda3/envs/diffusers/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3789, in from_pretrained
    with safe_open(resolved_archive_file, framework="pt") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
Traceback (most recent call last):
  File "/home/users/liudongyu/anaconda3/envs/diffusers/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/users/liudongyu/anaconda3/envs/diffusers/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/users/liudongyu/anaconda3/envs/diffusers/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
    simple_launcher(args)
  File "/home/users/liudongyu/anaconda3/envs/diffusers/lib/python3.11/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/users/liudongyu/anaconda3/envs/diffusers/bin/python', 'train_controlnet_sd3.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-3-medium-diffusers', '--output_dir=sd3-controlnet-out', '--train_data_dir=/home/users/liudongyu/datasets', '--resolution=1024', '--learning_rate=1e-5', '--max_train_steps=20000', '--train_batch_size=1', '--gradient_accumulation_steps=4']' returned non-zero exit status 1.

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • 🤗 Diffusers version: 0.31.0.dev0
  • Platform: Linux-3.10.0-1160.114.2.el7.x86_64-x86_64-with-glibc2.17
  • Running on Google Colab?: No
  • Python version: 3.11.10
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.25.2
  • Transformers version: 4.45.2
  • Accelerate version: 1.0.0
  • PEFT version: not installed
  • Bitsandbytes version: not installed
  • Safetensors version: 0.4.5
  • xFormers version: not installed
  • Accelerator: NVIDIA A100-PCIE-40GB, 40960 MiB
    NVIDIA A100 80GB PCIe, 81920 MiB
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help?

No response

@Viola-Siemens Viola-Siemens added the bug Something isn't working label Nov 14, 2024
@sayakpaul
Copy link
Member

It seems the download was probably incomplete or got corrupted. I would recommend clearing your local cache.

@Viola-Siemens
Copy link
Author

It seems the download was probably incomplete or got corrupted. I would recommend clearing your local cache.

Thanks for your reply. I'll take a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants