Inference failed with custom finetuned model #490

jinhaaya · 2024-11-12T08:05:44Z

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.58s/it]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:08<00:00, 1.66s/it]
Traceback (most recent call last):
File "/workspace/CogVideo/inference/cli_demo.py", line 181, in
generate_video(
File "/workspace/CogVideo/inference/cli_demo.py", line 85, in generate_video
pipe.fuse_lora(lora_scale=1 / lora_rank)
File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_pipeline.py", line 2888, in fuse_lora
super().fuse_lora(
File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_base.py", line 445, in fuse_lora
raise ValueError(f"{fuse_component} is not found in {self._lora_loadable_modules=}.")
ValueError: text_encoder is not found in self._lora_loadable_modules=['transformer'].
E1112 07:57:11.357000 4751 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 4816) of binary: /usr/bin/python3.10
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
inference/cli_demo.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-11-12_07:57:11
host : sg17
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 4816)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

I executed 'sh finetune_single_rank.sh' with my custom dataset which contains videos.txt, videos/, prompts.txt
and MODEL_PATH="THUDM/CogVideoX-2b"

Next, I executed torchrun --nnodes=1 --nproc_per_node=1 --master_port=29506 inference/cli_demo.py

lora_path: '/workspace/CogVideo/finetune/cogvideox-lora-single-node-1/checkpoint-X000/'
lora_rank: 128
which fits the finetune config.

and error occured : ValueError: text_encoder is not found in self._lora_loadable_modules=['transformer'].

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

[accelerate_config_machine_single.yaml]
num_processes: 2

Expected behavior / 期待表现

.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference failed with custom finetuned model #490

Inference failed with custom finetuned model #490

jinhaaya commented Nov 12, 2024 •

edited

Loading

Inference failed with custom finetuned model #490

Inference failed with custom finetuned model #490

Comments

jinhaaya commented Nov 12, 2024 • edited Loading

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

jinhaaya commented Nov 12, 2024 •

edited

Loading