Skip to content

Commit

Permalink
📚 docs: LLM doc & split tests (#129)
Browse files Browse the repository at this point in the history
  • Loading branch information
jean-francoisreboud authored Jul 12, 2024
1 parent 6a188fd commit c3a8ade
Show file tree
Hide file tree
Showing 13 changed files with 594 additions and 157 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ All notable changes to this project will be documented in this file.

## [unreleased]

📚 **docs:** LLM doc & split tests ([129](https://github.com/owkin/GrAIdient/pull/129))\
**layer_seq:** LLM generate ([128](https://github.com/owkin/GrAIdient/pull/128))\
**layer_seq:** MultiplySeq, SiLU & LLM test ([127](https://github.com/owkin/GrAIdient/pull/127))\
**layer_seq:** ValueCausalSeq ([126](https://github.com/owkin/GrAIdient/pull/126))\
Expand Down
14 changes: 13 additions & 1 deletion Docs/Examples/AutoEncoder.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,19 @@ conda env remove --name graiexamples

## Steps

1. Dump the training dataset.
Each train example uses a `CIFARAutoEncoderTrainer`.
The latter is responsible for initializing the training dataset
before the actual training takes place.

1. Train a simple auto encoder model.
1. Train a UNet like auto encoder model.
1. Train a StyleGAN like auto encoder model.

## Further tests

Further tests are available at
[AutoEncoderTests](../../Tests/GrAIExamples/AutoEncoderTests.swift).

The test `testTrain` compares the training of a `SimpleAutoEncoder`
in GrAIdient and in PyTorch to show that the same `loss` is computed
throughout the training.
1 change: 1 addition & 0 deletions Docs/Examples/EXAMPLES.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ The following examples are currently available:
- [VGG](VGG.md)
- [Vision Transformer](VisionTransformer.md)
- [Auto Encoder](AutoEncoder.md)
- [NLP](NLP.md)
50 changes: 50 additions & 0 deletions Docs/Examples/NLP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# 🚀 NLP Example

This is the documentation for running
[LLMs](../../Tests/GrAIExamples/NLPExample.swift) on the GPU.

## Setup

This example has some `Python` dependencies. In order to run
the example, we first have to setup the environment:

```bash
conda create --name graiexamples python=3.9
conda activate graiexamples
cd Tests/GrAIExamples/Base
pip install -e .
```

Then:
- download weights from
[MistralAI](https://docs.mistral.ai/getting-started/open_weight_models/).
- Update `_modelPath` in the
[NLPExample](../../Tests/GrAIExamples/NLPExample.swift) file with the
previous downloaded weights.
- Optionnally update `_prompt`.
- Rename `_testGenerate` into `testGenerate`.
- Run the test.

It is finally possible to clean the environment 🌍

```bash
conda deactivate
conda env remove --name graiexamples
```

## Steps

1. Generate text from a prompt.

## Further tests

Further tests are available at
[NLPExampleTests](../../Tests/GrAIExamples/NLPExampleTests.swift).
In order to run them, rename
`_testPredict1` and `_testPredict32` into `testPredict1` and `testPredict32`.

The test `testPredict1` compares the first step of generation
of a toy LLM (just one transformer block) in GrAIdient and in PyTorch.

The test `testPredict32` runs the first step of generation
of a full LLM in GrAIdient and compares the expected result from PyTorch.
14 changes: 14 additions & 0 deletions Docs/Examples/VGG.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,17 @@ conda env remove --name graiexamples
1. Train a model on the training dataset.
1. Evaluate the trained model on the testing dataset:
watch a better performance.

## Benchmarks

To benchmark the time performance of the VGG model, look at
[VGGBenchmark](../../Tests/GrAIExamples/VGGBenchmark.swift) and rename
`_test_TrainVGG` and `_test_EvalVGG` into `test_TrainVGG` and `test_EvalVGG`.

The test `test_TrainVGG` will measure the time spent for training the VGG
model for 20 steps.

The test `test_EvalVGG` will measure the time spent for running the VGG model
in inference for 20 steps.

Note that for both tests, the data is random and fixed once and for all.
17 changes: 17 additions & 0 deletions Docs/Examples/VisionTransformer.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,20 @@ conda env remove --name graiexamples

1. Dump the training dataset.
1. Train a simple Vision Transformer model.

## Benchmarks

To benchmark the time performance of the Vision Transformer model,
look at
[TransformerBenchmark](../../Tests/GrAIExamples/TransformerBenchmark.swift)
and rename
`_test_TrainTransformer` and `_test_EvalTransformer` into
`test_TrainTransformer` and `test_EvalTransformer`.

The test `test_TrainTransformer` will measure the time spent for training the
VisionTransformer model for 20 steps.

The test `test_EvalTransformer` will measure the time spent for running the
VisionTransformer model in inference for 20 steps.

Note that for both tests, the data is random and fixed once and for all.
35 changes: 25 additions & 10 deletions Tests/GrAIExamples/Base/python_lib/nlp/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,17 @@
import torch
import numpy as np
from pathlib import Path
from typing import Generator, List
from typing import Generator, List, Optional

from python_lib.nlp.tokenizer import Tokenizer
from python_lib.nlp.model import Transformer, TransformerArgs


def _predict_no_cache(
prompt: torch.Tensor, model: Transformer, temp: float = 0.0
prompt: torch.Tensor,
model: Transformer,
temp: float = 0.0,
n_layers: Optional[int] = None
) -> torch.Tensor:
"""
Predict text based on the given prompt and model.
Expand All @@ -22,6 +25,8 @@ def _predict_no_cache(
The model to use for generation.
temp: float
The temperature for sampling. If temp is 0, use max sampling.
n_layers: int
Modifier of the number of Transformer blocks.
Returns
-------
Expand All @@ -38,7 +43,7 @@ def sample(logits: torch.Tensor) -> torch.Tensor:
)

y = prompt
logits, _ = model(y[None], cache=None)
logits, _ = model(y[None], cache=None, n_layers=n_layers)
return sample(logits)


Expand Down Expand Up @@ -146,6 +151,7 @@ def _predict(
prompt: str,
model_path: str,
temp: float = 0,
n_layers: Optional[int] = None
):
"""
Predict text based on the given prompt and model.
Expand All @@ -158,6 +164,8 @@ def _predict(
Path to the model on the disk.
temp: float
The temperature for sampling. If temp is 0, use max sampling.
n_layers: int
Modifier of the number of Transformer blocks.
"""
state = torch.load(str(Path(model_path) / "consolidated.00.pth"))
tokenizer = Tokenizer(str(Path(model_path) / "tokenizer.model"))
Expand All @@ -178,14 +186,15 @@ def _predict(
)

tokens = _predict_no_cache(
prompt, model, temp
prompt, model, temp, n_layers
).squeeze(dim=0).cpu().numpy().tolist()
print(tokenizer.decode(tokens))


def predict(
prompt: str,
model_path: str
model_path: str,
n_layers: Optional[int] = None
) -> np.ndarray:
"""
Predict text based on the given prompt and model.
Expand All @@ -196,6 +205,8 @@ def predict(
The input prompt.
model_path: str
Path to the model on the disk.
n_layers: int
Modifier of the number of Transformer blocks.
"""
state = torch.load(str(Path(model_path) / "consolidated.00.pth"))
tokenizer = Tokenizer(str(Path(model_path) / "tokenizer.model"))
Expand All @@ -213,7 +224,7 @@ def predict(
prompt = torch.tensor(
tokenizer.encode(prompt), dtype=torch.long, device="mps"
)
out, _ = model(prompt[None])
out, _ = model(prompt[None], n_layers=n_layers)
return out.detach().cpu().numpy().flatten()


Expand Down Expand Up @@ -255,23 +266,27 @@ def decode(

if __name__ == "__main__":
model_path = ""
prompt = "How do you do?"

_generate(
prompt="How do you do?",
model_path=model_path
)
prompt = encode(
prompt="How do you do?",
prompt=prompt,
model_path=model_path
)
prompt = decode(
prompt=prompt,
model_path=model_path
)
_predict(
prompt="How do you do?",
prompt=prompt,
model_path=model_path,
n_layers=None
)
predict(
prompt="How do you do?",
model_path=model_path
prompt=prompt,
model_path=model_path,
n_layers=1
)
7 changes: 6 additions & 1 deletion Tests/GrAIExamples/Base/python_lib/nlp/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,7 @@ def forward(
self,
x: torch.Tensor,
cache=None,
n_layers=None
) -> Tuple[torch.Tensor, Optional[list]]:
"""
Forward pass.
Expand All @@ -388,6 +389,8 @@ def forward(
cache: (key_cache, value_cache): (torch.Tensor, torch.Tensor)
cache for keys and values
for generating tokens with past context.
n_layers: Int
Modifier of the number of Transformer blocks.
Returns
-------
Expand Down Expand Up @@ -424,9 +427,11 @@ def forward(
cache = [None] * len(self.layers)

for e, layer in enumerate(self.layers):
if n_layers is not None and e == n_layers:
break

h, cache[e] = layer(
h, rotation_matrix=rotation_matrix, mask=mask, cache=cache[e]
)
break

return self.output(self.norm(h)), cache
Loading

0 comments on commit c3a8ade

Please sign in to comment.