Perplexity (ppl) Calculation of Local Sparse Model: NaN issue #853

HengJayWang · 2024-10-19T05:40:41Z

👋 Hello Neural Magic community developers,

I encountered an issue while calculating the perplexity for a locally converted Llama3-8B sparse model using the llm-compress library. I'm refer the sparse conversion example script and change model to meta-llama/Meta-Llama-3-8B-Instruct by my self, the sparse conversion need ~ 1.2 hours to finish.
Here’s a detailed breakdown:

Describe the bug
While trying to compute the WikiText2 Perplexity for a Llama3-8B model that has been sparsified (load local model from disk), the resulting perplexity values always turn out to be NaN. I suspect that some configurations might not be properly set when using the custom SparseAutoModelForCausalLM class in combination with the compressed-tensors library.

Expected behavior
I expected the perplexity values to be reasonable and comparable to the official Hugging Face models. For example, when testing with the standard Llama-3.2-3B model from Hugging Face (without sparsification), I got a perplexity of around ~8.8 with the following parameters:

•	max_length=16K
•	stride=1, 2, 4, 8, 16K

I expected similar results for the sparse model, not NaN values.

Environment
I use RunPod online env with A100-80GB-SXM *2

To Reproduce
Steps to reproduce the behavior:

1.	Convert the Llama3-8B model using llm-compress to a sparse version.
2.	Load the sparse model using **_SparseAutoModelForCausalLM_** (same process [here](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_24_sparse_w4a16) ) and set up the environment to calculate perplexity.
3.	Run perplexity calculation on WikiText2 dataset following Hugging Face’s [official perplexity guide](https://huggingface.co/docs/transformers/perplexity), but using the custom sparse model.
4.	Observe the NaN perplexity values in the output.

Errors
Here’s the output I receive when running the perplexity calculation, see the attachment image. The perplexity of local Llama-8B model (load by SparseAutoModelForCausalLM class) always be NaN value. Test with Llama-3B model (load by AutoModelForCausalLM class) can successfully get ppl value.

Sparse Llama 8B (load by SparseAutoModelForCausalLM class) : ppl will be NaN

Load Online Llama 3B (load by AutoModelForCausalLM class) : successfully get ppl value

Additional context
The same perplexity calculation process works perfectly when using the Hugging Face Llama-3.2-3B model without sparsification, which gives a perplexity value of ~8.8. I believe the issue lies in either the custom sparse model class or the integration with compressed-tensors. Maybe I miss some additional configuration/setting of Sparse model ? 🧐
Any guidance on this would be appreciated! 🥰

Additional Question
How to load the final quantization model (i.e the model be saved in stage_quantization folder) correctly ?
I also interest of ppl of final quantization model, but I try load with SparseAutoModelForCausalLM it can not be work 😢
it shows some message mean : "... ... class not support ..."
So how to load the final quantization model correctly ? any documentation can be refer ? 🙏🏼

The text was updated successfully, but these errors were encountered:

robertgshaw2-neuralmagic · 2024-10-21T02:25:30Z

Can you share the model and perhaps some text output from the model? Does the text look reasonable?

HengJayWang · 2024-10-22T17:34:12Z

Hi @robertgshaw2-neuralmagic Robert, you were right to question this. I retested the original llama-7B Sparse conversion example from llm-compressor today, along with a simple model.generate test to check the model's text output. It turns out the model doesn’t seem to generate any correct outputs, and as expected, I couldn’t calculate the model’s perplexity under these circumstances.

Load local Sparse Llama-7B model

Test Model Output (Ref)

Calculating Perplexity

NaN Result

I think the issue is now clearer. I believe the problem lies in how I load the local Sparse Model & Tokenizer. Does llm-compressor have any examples or documentation I can refer to? Any suggestions would be appreciated, thank you! 🥰

Also, I apologize for not providing the exact sparse model I used. After running it in the online RunPod environment, I didn’t download the model. However, this process should be easy to replicate. Here are the steps I followed for testing:

Step 1: Execute the official llama-7B sparse conversion example from llm-compressor : run python llama7b_sparse_w4a16.py
Step 2: After about an hour, the sparse conversion finishes, and you’ll find the model saved in three stages in the output folder output_llama7b_2:4_w4a16_channel and I rename to output_llama7b_2_4_w4a16_channel for easy use.
Step 3: Load the stage_finetuing sparse model and Tokenizer in output_llama7b_2_4_w4a16_channel/stage_finetuning, and follow the HuggingFace process to calculate perplexity"

##The Success Case with Llama3-3B online model

Test Model Output

Calculating Perplexity

Result

Summary

I want to correctly load the local sparse model and calculate its perplexity as an evaluation metric. However, it seems that I haven’t used the correct method to load the model (through the SparseAutoModelForCausalLM class) or the Tokenizer. If there are any documents or resources I can refer to, please let me know. Thanks! 🥰

And my testing jupyter notebook is in attatchment.
Perplexity of model.zip

HengJayWang added the bug Something isn't working label Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perplexity (ppl) Calculation of Local Sparse Model: NaN issue #853

Perplexity (ppl) Calculation of Local Sparse Model: NaN issue #853

HengJayWang commented Oct 19, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Oct 21, 2024 •

edited

Loading

HengJayWang commented Oct 22, 2024

Perplexity (ppl) Calculation of Local Sparse Model: NaN issue #853

Perplexity (ppl) Calculation of Local Sparse Model: NaN issue #853

Comments

HengJayWang commented Oct 19, 2024 • edited Loading

Sparse Llama 8B (load by SparseAutoModelForCausalLM class) : ppl will be NaN

Load Online Llama 3B (load by AutoModelForCausalLM class) : successfully get ppl value

robertgshaw2-neuralmagic commented Oct 21, 2024 • edited Loading

HengJayWang commented Oct 22, 2024

Load local Sparse Llama-7B model

Test Model Output (Ref)

Calculating Perplexity

NaN Result

Test Model Output

Calculating Perplexity

Result

Summary

HengJayWang commented Oct 19, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Oct 21, 2024 •

edited

Loading