[Bug]: 0.6.3.post1 regression: RuntimeError during mem profiling on Mistral Large AWQ with -q awq_marlin
#809
Labels
bug
Something isn't working
-q awq_marlin
#809
Your current environment
The output of `python env.py`
🐛 Describe the bug
After upgrading from 0.6.2.post1 to 0.6.3.post1, I can no longer successfully load Mistral Large GEMM AWQ quants (such as
TechxGenus_Mistral-Large-Instruct-2407-AWQ
orcasperhansen_mistral-large-instruct-2407-awq
) with theawq_marlin
flag. Regularawq
works.Not yet clear if it affects AWQ quants for other models/architectures, I have not been able to test this yet, will follow-up when I can.
Expected behavior
mistralai_Mistral-Large-Instruct-2407
Actual behavior
RuntimeError: b_zeros dim 1 = 896 is not size_n = 7168
Full log output
Other notes
-q awq
does pass the memory profiling step, but throwsTypeError: SentencePieceTokenizer.encode() missing 2 required positional arguments: 'bos' and 'eos'
when actually submitting a prompt. This also could just be user error or something wrong with the pre-quantized models I'm using, I've not been able to prepare my own AWQ quant of this model yet.The text was updated successfully, but these errors were encountered: