Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Zero accuracy in Hellaswag for Llama-2-7b (using 8bit quantization) #275

Open
rankofootball opened this issue Aug 21, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@rankofootball
Copy link

command:

accelerate launch run_evals_accelerate.py --model_args="Llama-2-7b-chat-hf-8bit,quantization_config="load_in_8bit=True"" --tasks "helm|hellaswag|1|0" -- --output_dir ./evalscratch

Result is 0% correct
Llama-3 works fine as does MMLU for Llama-2.

Is there any way to log the individual outputs?

@rankofootball rankofootball added the bug Something isn't working label Aug 21, 2024
@rankofootball
Copy link
Author

I found by inspecting the parquet output that gold and prediction differ in a leading space:
[' A'] ['C']
[' B'] ['C']
[' C'] ['C']
[' A'] ['A']
[' B'] ['B']
[' B'] ['B']
[' A'] ['B']
[' D'] ['D']
[' A'] ['B']
[' B'] ['C']

is the space in gold normal?

@clefourrier
Copy link
Member

clefourrier commented Sep 14, 2024

Hi!
This is due to tokenization issues iirc.
A simple fix would be to change the task to be Answer: , then gold: A instead of Answer:, A @NathanHB wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants