Error in the file 2:4_w4a16_group-128_recipe.yaml #154

carrot-o0o · 2024-09-10T05:07:04Z

Describe the bug
This is a minor issue, but I think the quantization configuration in the file [examples/quantization_24_sparse_w4a16/2:4_w4a16_group-128_recipe.yaml](https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_24_sparse_w4a16/2%3A4_w4a16_group-128_recipe.yaml) should include ignore: ["lm_head"] like below. Otherwise, during saving the quantized model, the code results in a ValueError caused by compressed_tensors because the lm_head doesn't follow the 2:4 sparse pattern.

quantization_stage:
  run_type: oneshot
  quantization_modifiers:
    GPTQModifier:
      sequential_update: false
      ignore: ["lm_head"]
      config_groups:
        group_0:
          weights:
            num_bits: 4
            type: "int"
            symmetric: true
            strategy: "channel"
          targets: ["Linear"]

Expected behavior
A clear and concise description of what you expected to happen.

Environment
Include all relevant environment information:

OS [e.g. Ubuntu 20.04]: 22.04
Python version [e.g. 3.7]: 3.10
LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]: 7a0d232
ML framework version(s) [e.g. torch 2.3.1]: 2.4.0
Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]: compressed-tensors 0.5.0
Other relevant environment information [e.g. hardware, CUDA version]: CUDA 12.3

To Reproduce
Exact steps to reproduce the behavior:
i ran python examples/quantization_24_sparse_w4a16/llama7b_sparse_w4a16.py, where I changed the model path to another Llama model and the recipe path to 2:4_w4a16_group-128_recipe.yaml

Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

The Error without ignore lm_head:

Traceback (most recent call last):
  File "~/projects/sparse/llm-compressor/examples/quantization_24_sparse_w4a16/llama7b_sparse_w4a16.py", line 40, in <module>
    apply(
  File "~/projects/sparse/llm-compressor/src/llmcompressor/transformers/finetune/text_generation.py", line 93, in apply
    main(model_args, data_args, training_args)
  File "~/sparse/llm-compressor/src/llmcompressor/transformers/finetune/text_generation.py", line 348, in main
    stage_runner.run_sequential_stages(checkpoint)
  File "~/projects/sparse/llm-compressor/src/llmcompressor/transformers/finetune/runner.py", line 291, in run_sequential_stages
    self.one_shot(stage=stage_name)
  File "~/projects/sparse/llm-compressor/src/llmcompressor/transformers/finetune/runner.py", line 194, in one_shot
    save_model_and_recipe(
  File "~/projects/sparse/llm-compressor/src/llmcompressor/pytorch/model_load/helpers.py", line 110, in save_model_and_recipe
    model.save_pretrained(
  File "~/projects/sparse/llm-compressor/src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py", line 123, in save_pretrained_wrapper
    compressed_state_dict = compressor.compress(model, state_dict)
  File "~/conda/llm-compresser/lib/python3.10/site-packages/compressed_tensors/compressors/model_compressor.py", line 241, in compress
    compressed_state_dict = self.quantization_compressor.compress(
  File "~/conda/llm-compresser/lib/python3.10/site-packages/compressed_tensors/compressors/marlin_24.py", line 149, in compress
    self.validate_sparsity_structure(prefix, value)
  File "~/conda/llm-compresser/lib/python3.10/site-packages/compressed_tensors/compressors/marlin_24.py", line 99, in validate_sparsity_structure
    if not tensor_follows_mask_structure(weight):
  File "~/conda/llm-compresser/lib/python3.10/site-packages/compressed_tensors/utils/helpers.py", line 91, in tensor_follows_mask_structure
    raise ValueError()
ValueError

Additional context
Add any other context about the problem here. Also include any relevant files.

The text was updated successfully, but these errors were encountered:

markurtz · 2024-10-18T01:50:35Z

Thanks @carrot-o0o! We're in the process of revamping our sparsity pathways and I'll ensure this fix gets included and tested.

kylesayrs · 2024-10-18T15:54:56Z

I ran the example as described with 2:4_w4a16_group-128_recipe.yaml and did not encounter an error. I believe this was fixed by #80

Co-authored-by: dhuangnm <[email protected]>

carrot-o0o added the bug Something isn't working label Sep 10, 2024

markurtz self-assigned this Oct 18, 2024

markmc pushed a commit to markmc/llm-compressor that referenced this issue Nov 13, 2024

bump up main to 0.6.0 (vllm-project#154)

329d62b

Co-authored-by: dhuangnm <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in the file 2:4_w4a16_group-128_recipe.yaml #154

Error in the file 2:4_w4a16_group-128_recipe.yaml #154

carrot-o0o commented Sep 10, 2024 •

edited

Loading

markurtz commented Oct 18, 2024

kylesayrs commented Oct 18, 2024 •

edited

Loading

Error in the file 2:4_w4a16_group-128_recipe.yaml #154

Error in the file 2:4_w4a16_group-128_recipe.yaml #154

Comments

carrot-o0o commented Sep 10, 2024 • edited Loading

markurtz commented Oct 18, 2024

kylesayrs commented Oct 18, 2024 • edited Loading

carrot-o0o commented Sep 10, 2024 •

edited

Loading

kylesayrs commented Oct 18, 2024 •

edited

Loading