You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
A clear and concise description of what the bug is.
2024-11-01T03:22:01.826294+0000 | one_shot | INFO - *** One Shot ***
2024-11-01T03:22:01.830653+0000 | from_modifiers | INFO - Creating recipe from modifiers
2024-11-01T03:22:01.831398+0000 | create_instance | WARNING - Could not process input as a file path or zoo stub, attempting to process it as a string.
2024-11-01T03:22:01.875859+0000 | _check_compile_recipe | INFO - Recipe compiled and 1 modifiers created
2024-11-01T03:22:01.903544+0000 | on_initialize_structure | WARNING - GPTQ quantization is set to True without an active quantization modifier.
2024-11-01T03:22:01.903643+0000 | _build_quant_modifier | INFO - Building quantization modifier with args: {'targets': 'Linear', 'scheme': 'W8A8', 'ignore': ['lm_head', 're:.*mlp.gate$']}
2024-11-01T03:22:02.877828+0000 | _check_calibration_data | INFO - Skipping QuantizationModifier calibration, it is not required for the provided quantization config.
2024-11-01T03:22:05.311851+0000 | initialize_compression | INFO - Preparing model.layers.0 for compression
2024-11-01T03:22:05.319877+0000 | initialize_compression | INFO - Preparing model.layers.1 for compression
2024-11-01T03:22:10.454367+0000 | initialize_compression | INFO - Preparing model.layers.2 for compression
2024-11-01T03:22:14.264467+0000 | initialize_compression | INFO - Preparing model.layers.3 for compression
2024-11-01T03:22:20.000751+0000 | initialize_compression | INFO - Preparing model.layers.4 for compression
2024-11-01T03:22:24.437408+0000 | initialize_compression | INFO - Preparing model.layers.5 for compression
Traceback (most recent call last):
File "/testspace/repo/deepseek/llm-compressor/examples/quantizing_moe/deepseek_moe_w8a8_int8.py", line 79, in
oneshot(
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot
main(model_args, data_args, training_args)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 359, in main
stage_runner.one_shot()
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/transformers/finetune/runner.py", line 171, in one_shot
self.trainer.one_shot(calibration_data=calib_data, stage=stage)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 401, in one_shot
apply(
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/core/session_functions.py", line 184, in apply
return active_session().apply(
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/core/session.py", line 210, in apply
self.initialize(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/core/session.py", line 156, in initialize
mod_data = self._lifecycle.initialize(
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 126, in initialize
data = mod.initialize(state=self.state, **extras)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/stage.py", line 124, in initialize
modifier.initialize(state, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/modifier.py", line 118, in initialize
initialized = self.on_initialize(state=state, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/quantization/gptq/base.py", line 187, in on_initialize
self.initialize_compression(modifiable_model, calibration_dataloader)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/quantization/gptq/base.py", line 246, in initialize_compression
compressor.pre_compress()
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/utils/layer_compressor.py", line 79, in pre_compress
wrapper = self.module_compressor_class(full_name, layer)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/quantization/gptq/utils/gptq_wrapper.py", line 45, in init
"H", torch.zeros((self.columns, self.columns), device=self.dev)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacity of 44.55 GiB of which 7.69 MiB is free. Process 689439 has 1.40 GiB memory in use. Process 3382592 has 260.00 MiB memory in use. Process 3845982 has 42.87 GiB memory in use. Of the allocated memory 42.56 GiB is allocated by PyTorch, and 18.16 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
root@s0pgpuap12:/testspace/repo/deepseek/llm-compressor/examples/quantizing_moe# CUDA_VISIBLE_DEVICES=0,1,2,5 python deepseek_moe_w8a8_int8.py
The repository for /testspace/DeepSeek-Coder-V2-Lite-Instruct contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//testspace/DeepSeek-Coder-V2-Lite-Instruct.
You can avoid this prompt in future by passing the argument trust_remote_code=True.
Do you wish to run the custom code? [y/N] y
The repository for /testspace/DeepSeek-Coder-V2-Lite-Instruct contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//testspace/DeepSeek-Coder-V2-Lite-Instruct.
You can avoid this prompt in future by passing the argument trust_remote_code=True.
Expected behavior
A clear and concise description of what you expected to happen.
Environment
Include all relevant environment information:
OS [e.g. Ubuntu 20.04]:
Python version [e.g. 3.7]:
LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]:
ML framework version(s) [e.g. torch 2.3.1]:
Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
Other relevant environment information [e.g. hardware, CUDA version]:
To Reproduce
Exact steps to reproduce the behavior:
Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
Additional context
Add any other context about the problem here. Also include any relevant files.
The text was updated successfully, but these errors were encountered:
Describe the bug
A clear and concise description of what the bug is.
2024-11-01T03:22:01.826294+0000 | one_shot | INFO - *** One Shot ***
2024-11-01T03:22:01.830653+0000 | from_modifiers | INFO - Creating recipe from modifiers
2024-11-01T03:22:01.831398+0000 | create_instance | WARNING - Could not process input as a file path or zoo stub, attempting to process it as a string.
2024-11-01T03:22:01.875859+0000 | _check_compile_recipe | INFO - Recipe compiled and 1 modifiers created
2024-11-01T03:22:01.903544+0000 | on_initialize_structure | WARNING - GPTQ quantization is set to True without an active quantization modifier.
2024-11-01T03:22:01.903643+0000 | _build_quant_modifier | INFO - Building quantization modifier with args: {'targets': 'Linear', 'scheme': 'W8A8', 'ignore': ['lm_head', 're:.*mlp.gate$']}
2024-11-01T03:22:02.877828+0000 | _check_calibration_data | INFO - Skipping QuantizationModifier calibration, it is not required for the provided quantization config.
2024-11-01T03:22:05.311851+0000 | initialize_compression | INFO - Preparing model.layers.0 for compression
2024-11-01T03:22:05.319877+0000 | initialize_compression | INFO - Preparing model.layers.1 for compression
2024-11-01T03:22:10.454367+0000 | initialize_compression | INFO - Preparing model.layers.2 for compression
2024-11-01T03:22:14.264467+0000 | initialize_compression | INFO - Preparing model.layers.3 for compression
2024-11-01T03:22:20.000751+0000 | initialize_compression | INFO - Preparing model.layers.4 for compression
2024-11-01T03:22:24.437408+0000 | initialize_compression | INFO - Preparing model.layers.5 for compression
Traceback (most recent call last):
File "/testspace/repo/deepseek/llm-compressor/examples/quantizing_moe/deepseek_moe_w8a8_int8.py", line 79, in
oneshot(
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 76, in oneshot
main(model_args, data_args, training_args)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/transformers/finetune/text_generation.py", line 359, in main
stage_runner.one_shot()
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/transformers/finetune/runner.py", line 171, in one_shot
self.trainer.one_shot(calibration_data=calib_data, stage=stage)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/transformers/finetune/session_mixin.py", line 401, in one_shot
apply(
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/core/session_functions.py", line 184, in apply
return active_session().apply(
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/core/session.py", line 210, in apply
self.initialize(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/core/session.py", line 156, in initialize
mod_data = self._lifecycle.initialize(
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/core/lifecycle.py", line 126, in initialize
data = mod.initialize(state=self.state, **extras)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/stage.py", line 124, in initialize
modifier.initialize(state, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/modifier.py", line 118, in initialize
initialized = self.on_initialize(state=state, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/quantization/gptq/base.py", line 187, in on_initialize
self.initialize_compression(modifiable_model, calibration_dataloader)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/quantization/gptq/base.py", line 246, in initialize_compression
compressor.pre_compress()
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/utils/layer_compressor.py", line 79, in pre_compress
wrapper = self.module_compressor_class(full_name, layer)
File "/opt/conda/lib/python3.10/site-packages/llmcompressor/modifiers/quantization/gptq/utils/gptq_wrapper.py", line 45, in init
"H", torch.zeros((self.columns, self.columns), device=self.dev)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacity of 44.55 GiB of which 7.69 MiB is free. Process 689439 has 1.40 GiB memory in use. Process 3382592 has 260.00 MiB memory in use. Process 3845982 has 42.87 GiB memory in use. Of the allocated memory 42.56 GiB is allocated by PyTorch, and 18.16 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
root@s0pgpuap12:/testspace/repo/deepseek/llm-compressor/examples/quantizing_moe# CUDA_VISIBLE_DEVICES=0,1,2,5 python deepseek_moe_w8a8_int8.py
The repository for /testspace/DeepSeek-Coder-V2-Lite-Instruct contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//testspace/DeepSeek-Coder-V2-Lite-Instruct.
You can avoid this prompt in future by passing the argument
trust_remote_code=True
.Do you wish to run the custom code? [y/N] y
The repository for /testspace/DeepSeek-Coder-V2-Lite-Instruct contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//testspace/DeepSeek-Coder-V2-Lite-Instruct.
You can avoid this prompt in future by passing the argument
trust_remote_code=True
.Expected behavior
A clear and concise description of what you expected to happen.
Environment
Include all relevant environment information:
f7245c8
]:To Reproduce
Exact steps to reproduce the behavior:
Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
Additional context
Add any other context about the problem here. Also include any relevant files.
The text was updated successfully, but these errors were encountered: