Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dllama: src/commands.cpp:102: MultiHeadAttSlice::MultiHeadAttSlice(unsigned int, unsigned int, unsigned int, slice_index_t): Assertion `nHeads % nSlices == 0' failed. #98

Open
EntusiastaIApy opened this issue Jul 7, 2024 · 3 comments

Comments

@EntusiastaIApy
Copy link

Hello, @b4rtaz!

I'm trying to run model nkpz/llama2-22b-chat-wizard-uncensored on a cluster composed of 1 Raspberry Pi 4B 8 Gb and 7 Raspberry Pi 4B 4 Gb, but, both on inference and chat modes, distributed llama throws the following error. Do you know why this is happening and how to fix it?

llama2-22b-chat-wizard-uncensored_q40_8nodes_switch_sdcard_inference-error

@b4rtaz
Copy link
Owner

b4rtaz commented Jul 10, 2024

Hello @EntusiastaIApy,

I think the problem is that: "num_attention_heads": 52 The current implementation expects that this number can be divided by the number of nodes without remainder.

52 / 8 => 6 remainder 4

This is basically a bug.

@Different-Pranav
Copy link

I am facing a similar kind of issue. I am trying to run TinyLlama in the dllama environment. I am using 2 worker nodes of 8 GB ram each but it throws a similar kind of error.
Screenshot 2024-09-13 200436

@b4rtaz
Copy link
Owner

b4rtaz commented Sep 13, 2024

@Different-Pranav you are using 3 nodes (root + 2 workers). You should try with 2 nodes (1 root + 1 worker) or 4 nodes (1 root + 3 workers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants