[BUG] Zero accuracy in Hellaswag for Llama-2-7b (using 8bit quantization) #275

rankofootball · 2024-08-21T21:28:24Z

command:

accelerate launch run_evals_accelerate.py --model_args="Llama-2-7b-chat-hf-8bit,quantization_config="load_in_8bit=True"" --tasks "helm|hellaswag|1|0" -- --output_dir ./evalscratch

Result is 0% correct
Llama-3 works fine as does MMLU for Llama-2.

Is there any way to log the individual outputs?

rankofootball · 2024-08-24T13:56:05Z

I found by inspecting the parquet output that gold and prediction differ in a leading space:
[' A'] ['C']
[' B'] ['C']
[' C'] ['C']
[' A'] ['A']
[' B'] ['B']
[' B'] ['B']
[' A'] ['B']
[' D'] ['D']
[' A'] ['B']
[' B'] ['C']

is the space in gold normal?

clefourrier · 2024-09-14T08:23:42Z

Hi!
This is due to tokenization issues iirc.
A simple fix would be to change the task to be Answer: , then gold: A instead of Answer:, A @NathanHB wdyt?

rankofootball added the bug Something isn't working label Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Zero accuracy in Hellaswag for Llama-2-7b (using 8bit quantization) #275

[BUG] Zero accuracy in Hellaswag for Llama-2-7b (using 8bit quantization) #275

rankofootball commented Aug 21, 2024

rankofootball commented Aug 24, 2024

clefourrier commented Sep 14, 2024 •

edited

Loading

[BUG] Zero accuracy in Hellaswag for Llama-2-7b (using 8bit quantization) #275

[BUG] Zero accuracy in Hellaswag for Llama-2-7b (using 8bit quantization) #275

Comments

rankofootball commented Aug 21, 2024

rankofootball commented Aug 24, 2024

clefourrier commented Sep 14, 2024 • edited Loading

clefourrier commented Sep 14, 2024 •

edited

Loading