You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to run model nkpz/llama2-22b-chat-wizard-uncensored on a cluster composed of 1 Raspberry Pi 4B 8 Gb and 7 Raspberry Pi 4B 4 Gb, but, both on inference and chat modes, distributed llama throws the following error. Do you know why this is happening and how to fix it?
The text was updated successfully, but these errors were encountered:
I think the problem is that: "num_attention_heads": 52 The current implementation expects that this number can be divided by the number of nodes without remainder.
I am facing a similar kind of issue. I am trying to run TinyLlama in the dllama environment. I am using 2 worker nodes of 8 GB ram each but it throws a similar kind of error.
Hello, @b4rtaz!
I'm trying to run model nkpz/llama2-22b-chat-wizard-uncensored on a cluster composed of 1 Raspberry Pi 4B 8 Gb and 7 Raspberry Pi 4B 4 Gb, but, both on inference and chat modes, distributed llama throws the following error. Do you know why this is happening and how to fix it?
The text was updated successfully, but these errors were encountered: