Skip to content

Commit

Permalink
feat: nSlices <= nKvHeads limit.
Browse files Browse the repository at this point in the history
  • Loading branch information
b4rtaz committed May 27, 2024
1 parent 2fa9d9f commit df1d360
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 1 deletion.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ Python and GCC required. Download this repository and run:
- [API Server](./src/apps/dllama-api/README.md)

**Known limitations:**
* You can run Distributed Llama only on 1, 2, 4... 2^n devices.
* You can run Distributed Llama only on 1, 2, 4... 2^n nodes.
* The maximum number of nodes is equal to the number of KV heads in the model [#70](https://github.com/b4rtaz/distributed-llama/issues/70).
* Optimized for (weights format × buffer format):
* ARM CPUs
* ✅ F32 × F32
Expand Down
4 changes: 4 additions & 0 deletions src/transformer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,10 @@ TransformerSpec Transformer::loadSpecFromFile(const char* path, const unsigned i
spec.bufferFloatType = bufferFloatType;
spec.nSlices = nSlices;

if (spec.nSlices > spec.nKvHeads) {
// TODO: https://github.com/b4rtaz/distributed-llama/issues/70
throw std::runtime_error("This version does not support more nodes than the number of KV heads in the model.");
}
if (spec.archType == LLAMA) {
printf("💡 arch: llama\n");
} else if (spec.archType == GROK1) {
Expand Down

0 comments on commit df1d360

Please sign in to comment.