Can't use the output model from training script - llama3.1 #35

yannis-pa · 2024-09-24T07:44:44Z

Hi Snowflake team, thank you for this great work!
I have followed your very helpful guide on finetuning llama 405b 8bit precision, with LoRA.
Everything seemingly goes well and in the end I have an output folder that has among others a number of safetensor files, presumably the adapter model?
I have difficulty in using these though:
a) the script you provide to perform merging, doesn't work bc it expects a single torch checkpoint file. I am invoiking it like so btw `PYTHONPATH=. python training/llama3.1/apply_ds_adapters.py --ckpt-path trained_model/checkpoint-700 --output-dir merged-ckpt-700 --model-name meta-llama/Meta-Llama-3.1-405B
b) doing the standard huggingface merging doesn't work as well since HF complaints that it needs an adapter_config and an adapter_mode.bin:

base_model = AutoModelForCausalLM.from_pretrained(
        base_model_str,
        device_map ="auto",
        torch_dtype=torch.float16)

    peft_model = PeftModel.from_pretrained(
        base_model,
        peft_model_str,
        torch_dtype=torch.float16,
        device_map ="auto",
    )
    model = peft_model.merge_and_unload()
    model.save_pretrained(merged_model_name, safe_serialization=True)

c) trying to load as a transformers model fails too (probably expected):

>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> model = AutoModelForCausalLM.from_pretrained("trained_model/checkpoint-701", device_map="auto")
Loading checkpoint shards:   0%|                                                                                                          | 0/13 [00:09<?, ?it/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/data02/snowflake-arctic/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
  File "/home/ubuntu/data02/snowflake-arctic/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3916, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/ubuntu/data02/snowflake-arctic/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4390, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home/ubuntu/data02/snowflake-arctic/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 936, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/home/ubuntu/data02/snowflake-arctic/venv/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 373, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([212992, 516]) in "weight" (which has shape torch.Size([16384, 53248])), this looks incorrect.

I'd be grateful if I am doing sth wrong or there's a different way to load and use the model, thank you!

The text was updated successfully, but these errors were encountered:

yannis-pa changed the title ~~How to use the output model from training script - llama3.1~~ Can't use the output model from training script - llama3.1 Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't use the output model from training script - llama3.1 #35

Can't use the output model from training script - llama3.1 #35

yannis-pa commented Sep 24, 2024

Can't use the output model from training script - llama3.1 #35

Can't use the output model from training script - llama3.1 #35

Comments

yannis-pa commented Sep 24, 2024