feat: nSlices <= nKvHeads limit.

b4rtaz · May 27, 2024 · df1d360 · df1d360
1 parent 2fa9d9f
commit df1d360
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -26,7 +26,8 @@ Python and GCC required. Download this repository and run:
 - [API Server](./src/apps/dllama-api/README.md)
 
 **Known limitations:**
-* You can run Distributed Llama only on 1, 2, 4... 2^n devices.
+* You can run Distributed Llama only on 1, 2, 4... 2^n nodes.
+* The maximum number of nodes is equal to the number of KV heads in the model [#70](https://github.com/b4rtaz/distributed-llama/issues/70). 
 * Optimized for (weights format × buffer format):
   * ARM CPUs
     * ✅ F32 × F32

diff --git a/src/transformer.cpp b/src/transformer.cpp
@@ -251,6 +251,10 @@ TransformerSpec Transformer::loadSpecFromFile(const char* path, const unsigned i
     spec.bufferFloatType = bufferFloatType;
     spec.nSlices = nSlices;
 
+    if (spec.nSlices > spec.nKvHeads) {
+        // TODO: https://github.com/b4rtaz/distributed-llama/issues/70
+        throw std::runtime_error("This version does not support more nodes than the number of KV heads in the model.");
+    }
     if (spec.archType == LLAMA) {
         printf("💡 arch: llama\n");
     } else if (spec.archType == GROK1) {