Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

πŸš€ Release 0.4.0 #134

Merged
merged 24 commits into from
Sep 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
064392b
✨ feat: VQGrad, VQGradSeq (#107)
jean-francoisreboud Sep 18, 2023
3130f05
✨ feat: Dropout1D (#108)
jean-francoisreboud Oct 7, 2023
516833d
✨ feat(core): initForward,Backward model API (#109)
jean-francoisreboud Dec 2, 2023
63934a9
πŸ› fix: run on Apple Silicon (#110)
jean-francoisreboud Dec 8, 2023
c2988f1
πŸš€ perf: benchmark ViT base model (#111)
jean-francoisreboud Jan 2, 2024
4969db6
πŸš€ perf: QuerySelf & ValueSelf (#112)
jean-francoisreboud Jan 3, 2024
096b95d
✨ feat(core): GELU vs GELUApprox (#113)
jean-francoisreboud Jan 5, 2024
3d3191d
✨ feat: LayerCAM2D -> VQGrad2D, LayerCAMSeq -> VQGradSeq (#117)
jean-francoisreboud Feb 17, 2024
192f994
πŸš€ perf: Convolution2D (#118)
jean-francoisreboud Feb 28, 2024
a9d176c
πŸš€ perf: copy & generate weights faster (#119)
jean-francoisreboud May 12, 2024
52ab4df
πŸ”¨ refactor: handle float16 along float on GPU (#120)
jean-francoisreboud May 12, 2024
ceff714
πŸš€ perf: use half in Metal kernels (#121)
jean-francoisreboud May 22, 2024
d97e520
✨ feat(layer_seq): EmbeddingSeq (#122)
jean-francoisreboud Jun 14, 2024
2d65e95
✨ feat(layer_seq): RMSNormSeq (#123)
jean-francoisreboud Jun 16, 2024
03e2617
✨ feat(layer_seq): RoPESeq (#124)
jean-francoisreboud Jun 19, 2024
6dd84dd
✨ feat(layer_seq): QueryCausalSeq (#125)
jean-francoisreboud Jun 28, 2024
8ab07d5
✨ feat(layer_seq): ValueCausalSeq (#126)
jean-francoisreboud Jul 1, 2024
0e34be3
✨ layer_seq: MultiplySeq, SiLU & LLM test (#127)
jean-francoisreboud Jul 4, 2024
6a188fd
✨ feat(layer_seq): LLM generate (#128)
jean-francoisreboud Jul 10, 2024
c3a8ade
πŸ“š docs: LLM doc & split tests (#129)
jean-francoisreboud Jul 12, 2024
723b021
πŸš€ test(examples): 3 LLMs examples (#130)
jean-francoisreboud Jul 15, 2024
54b4a30
✨ feat(layer_seq): LLM sliding window (#131)
jean-francoisreboud Jul 19, 2024
838e922
πŸš€ test(examples): integrate Gemma2-2B (#132)
jean-francoisreboud Sep 1, 2024
6f8720a
πŸ”§ chore: update changelog (#133)
jean-francoisreboud Sep 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,40 @@ All notable changes to this project will be documented in this file.

## [unreleased]

## 0.4.0 (2024-09-01)

### Features

πŸš€ **examples:** integrate Gemma2-2B ([#132](https://github.com/owkin/GrAIdient/pull/132))\
✨ **layer_seq:** LLM sliding window ([#131](https://github.com/owkin/GrAIdient/pull/131))\
πŸš€ **examples:** 3 LLMs examples ([#130](https://github.com/owkin/GrAIdient/pull/130))\
✨ **layer_seq:** LLM generate ([128](https://github.com/owkin/GrAIdient/pull/128))\
✨ **layer_seq:** MultiplySeq, SiLU & LLM test ([127](https://github.com/owkin/GrAIdient/pull/127))\
✨ **layer_seq:** ValueCausalSeq ([126](https://github.com/owkin/GrAIdient/pull/126))\
✨ **layer_seq:** QueryCausalSeq ([125](https://github.com/owkin/GrAIdient/pull/125))\
✨ **layer_seq:** RoPESeq ([124](https://github.com/owkin/GrAIdient/pull/124))\
✨ **layer_seq:** RMSNormSeq ([123](https://github.com/owkin/GrAIdient/pull/123))\
✨ **layer_seq:** EmbeddingSeq ([122](https://github.com/owkin/GrAIdient/pull/122))\
πŸͺœ **feat:** LayerCAM2D -> VQGrad2D, LayerCAMSeq -> VQGradSeq ([#117](https://github.com/owkin/GrAIdient/pull/117))\
βš™οΈ **core:** GELU vs GELUApprox ([113](https://github.com/owkin/GrAIdient/pull/113))\
πŸš€ **perf:** QuerySelf & ValueSelf ([112](https://github.com/owkin/GrAIdient/pull/112))\
πŸš€ **perf:** benchmark ViT base model ([111](https://github.com/owkin/GrAIdient/pull/111))\
βš™οΈ **core:** initForward,Backward model API ([109](https://github.com/owkin/GrAIdient/pull/109))\
πŸͺœ **layer_1d:** Dropout1D ([#108](https://github.com/owkin/GrAIdient/pull/108))\
πŸͺœ **feat:** VQGrad, VQGradSeq ([#107](https://github.com/owkin/GrAIdient/pull/107))

### Bug Fixes

πŸ› **fix:** run on Apple Silicon ([110](https://github.com/owkin/GrAIdient/pull/110))

### Miscellaneous Tasks

πŸ“š **docs:** LLM doc & split tests ([129](https://github.com/owkin/GrAIdient/pull/129))\
πŸš€ **perf:** use half in Metal kernels ([121](https://github.com/owkin/GrAIdient/pull/121))\
πŸ”¨ **refactor:** handle float16 along float on GPU ([#120](https://github.com/owkin/GrAIdient/pull/120))\
πŸš€ **perf:** copy & generate weights faster ([119](https://github.com/owkin/GrAIdient/pull/119))\
πŸš€ **perf:** Convolution2D ([118](https://github.com/owkin/GrAIdient/pull/118))

## 0.3.1 (2023-08-09)

### Bug Fixes
Expand Down
5 changes: 3 additions & 2 deletions Docs/Contributing/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,13 +248,14 @@ containing the commits to merge into the `main` branch.
Do not delete the "Unreleased" section title: future PRs will insert
changelog items in this section.
- Commit and push the changes.
- Squash and merge the new branch into `release_N`.
- Squash and merge the new branch into `release_N` with title \
πŸ”§ chore: update changelog

1. Create a Pull Request for `release_N` targeting the `main` branch.

1. Review and Merge the Pull Request, change the commit
message \
πŸ”§ chore: release X.Y.Z
πŸš€ Release X.Y.Z

1. Create a GitHub release X.Y.Z from `main`:
- GitHub > Releases > Draft new Release
Expand Down
14 changes: 13 additions & 1 deletion Docs/Examples/AutoEncoder.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,19 @@ conda env remove --name graiexamples

## Steps

1. Dump the training dataset.
Each train example uses a `CIFARAutoEncoderTrainer`.
The latter is responsible for initializing the training dataset
before the actual training takes place.

1. Train a simple auto encoder model.
1. Train a UNet like auto encoder model.
1. Train a StyleGAN like auto encoder model.

## Further tests

Further tests are available at
[AutoEncoderTests](../../Tests/GrAIExamples/AutoEncoderTests.swift).

The test `testTrain` compares the training of a `SimpleAutoEncoder`
in GrAIdient and in PyTorch to show that the same `loss` is computed
throughout the training.
1 change: 1 addition & 0 deletions Docs/Examples/EXAMPLES.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ The following examples are currently available:
- [VGG](VGG.md)
- [Vision Transformer](VisionTransformer.md)
- [Auto Encoder](AutoEncoder.md)
- [LLM](LLM.md)
64 changes: 64 additions & 0 deletions Docs/Examples/LLM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# πŸš€ LLM Example

This is the documentation for running
[LLMs](../../Tests/GrAIExamples/LLMExample.swift) on the GPU.

## Setup

This example has some `Python` dependencies. In order to run
the example, we first have to setup the environment:

```bash
conda create --name graiexamples python=3.9
conda activate graiexamples
cd Tests/GrAIExamples/Base
pip install -e .
```

Then:
- Download weights from
[MistralAI](https://docs.mistral.ai/getting-started/open_weight_models/)
(mistral-7B-Instruct-v0.3)
and / or
[Llama](https://llama.meta.com/llama-downloads/)
(llama-2-7b-chat or Meta-Llama-3-8B-Instruct)
and / or Gemma2 from [HuggingFace](https://huggingface.co/google/gemma-2-2b-it)
(Gemma-2-2b-it).
- Update `_modelPathMistral`, `_modelPathLlama2`, `_modelPathLlama3`,
`_modelPathGemma2` in the
[LLMExample](../../Tests/GrAIExamples/LLMExample.swift) file with the
previous downloaded weights.
- Optionnally update `_prompt`.
- Rename `_testGenerateMistral`, `_testGenerateLlama2`, `_testGenerateLlama3`
and `_testGenerateGemma2`
into
`testGenerateMistral`, `testGenerateLlama2`, `testGenerateLlama3` and
`testGenerateGemma2`.
- Run the tests.

It is finally possible to clean the environment 🌍

```bash
conda deactivate
conda env remove --name graiexamples
```

## Steps

1. Generate text from a prompt with Mistral 7B Instruct model.
1. Generate text from a prompt with Llama 2 7B Chat model.
1. Generate text from a prompt with Llama 3 8B Instruct model.
1. Generata text from a prompt with Gemme 2 2B Instruct model.

## Further tests

Further tests are available at
[LLMExampleTests](../../Tests/GrAIExamples/LLMExampleTests.swift).
In order to run them, rename
`_testPredict1` and `_testPredict32` into `testPredict1` and `testPredict32`.

The test `testPredict1` compares the first step of generation
of a toy LLM (just one transformer block) in GrAIdient and in PyTorch.

The test `testPredict32` runs the first step of generation
of a full LLM in GrAIdient and compares the expected result from PyTorch.
14 changes: 14 additions & 0 deletions Docs/Examples/VGG.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,17 @@ conda env remove --name graiexamples
1. Train a model on the training dataset.
1. Evaluate the trained model on the testing dataset:
watch a better performance.

## Benchmarks

To benchmark the time performance of the VGG model, look at
[VGGBenchmark](../../Tests/GrAIExamples/VGGBenchmark.swift) and rename
`_test_TrainVGG` and `_test_EvalVGG` into `test_TrainVGG` and `test_EvalVGG`.

The test `test_TrainVGG` will measure the time spent for training the VGG
model for 20 steps.

The test `test_EvalVGG` will measure the time spent for running the VGG model
in inference for 20 steps.

Note that for both tests, the data is random and fixed once and for all.
17 changes: 17 additions & 0 deletions Docs/Examples/VisionTransformer.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,20 @@ conda env remove --name graiexamples

1. Dump the training dataset.
1. Train a simple Vision Transformer model.

## Benchmarks

To benchmark the time performance of the Vision Transformer model,
look at
[TransformerBenchmark](../../Tests/GrAIExamples/TransformerBenchmark.swift)
and rename
`_test_TrainTransformer` and `_test_EvalTransformer` into
`test_TrainTransformer` and `test_EvalTransformer`.

The test `test_TrainTransformer` will measure the time spent for training the
VisionTransformer model for 20 steps.

The test `test_EvalTransformer` will measure the time spent for running the
VisionTransformer model in inference for 20 steps.

Note that for both tests, the data is random and fixed once and for all.
2 changes: 1 addition & 1 deletion Package.swift
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import PackageDescription
let package = Package(
name: "GrAIdient",
platforms: [
.macOS(.v10_15)
.macOS(.v13)
],
products: [
.library(
Expand Down
159 changes: 153 additions & 6 deletions Sources/GrAITestsUtils/Trainer.swift
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ extension TestError: CustomStringConvertible
///
/// - Parameter model: The model on which to select the initialization scheme.
///
func randomSelectWeightsInitializationScheme(model: Model)
public func randomSelectWeightsInitializationScheme(model: Model)
{
let choice = Int.random(in: 0...4)
switch choice {
Expand Down Expand Up @@ -365,6 +365,153 @@ open class FlowTrainer: Trainer
}
}

/// Pipeline that compares gradients of weights computed in the CPU execution context againt the GPU one.
open class FlowPrecisionTrainer: Trainer
{
///
/// The two models:
/// [model to execute with Float precision, same model to execute with Float16 precision].
///
public var models: [Model] = []

/// Get the model to execute with Float precision.
public var modelFloat: Model
{
get {
return models[0]
}
}
/// Get the model to execute with Float16 precision.
public var modelFloat16: Model
{
get {
return models[1]
}
}

///
/// Create a model in the two execution contexts: CPU and GPU.
///
/// - Parameter buildFct: A Function that creates the different layers of the models.
///
public func build(_ buildFct: (ModelContext)->())
{
var baseModels = [BaseModel]()

let context = ModelContext(name: modelName + "Float", curID: 0)
buildFct(context)
baseModels.append(context.model)

context.model = BaseModel(name: modelName + "Float16")
buildFct(context)
baseModels.append(context.model)

var models = [Model]()
for baseModel in baseModels
{
models.append(Model(model: baseModel, modelsPrev: []))
}
self.models = models
}

/// Initialize the kernel of the models.
public func initialize()
{
for i in 0...1
{
if i == 0
{
GrAI.Precision.float = true
randomSelectWeightsInitializationScheme(model: modelFloat)
}

if i > 0
{
models[i].weights = models[i-1].weights
}

if i == 1
{
GrAI.Precision.float16 = true
}

models[i].initialize(
params: optimizerParams,
phase: .Training,
deviceID: DEVICE_ID
)
}
}

///
/// Run the test.
///
/// The goal is to compare the gradients of weights computed with Float precision with
/// the gradients of weights computed with Float16 precision.
///
/// - Parameters:
/// - setData: A function to create/set data to the model.
/// - setLoss: A function to create/set ground truth to the model.
/// - validate: A function that checks whether the relative difference is small enough.
///
public func run<DataT, LossT>(
setData: (DataT?, Model)->(DataT, Int),
setLoss: (LossT?, Model)->(LossT),
validate: (Double) throws -> ()) throws
{
initialize()

var epoch = 0
let nbEpochsMax = 1
while epoch < nbEpochsMax
{
var numLoop = 0
while numLoop < optimizerParams.nbLoops
{
let resultsFloat: [Double]
GrAI.Precision.float = true

var (inputs, batchSize) = setData(nil, modelFloat)
modelFloat.updateKernel(batchSize: batchSize)
try! modelFloat.forward()

var gt = setLoss(nil, modelFloat)
try! modelFloat.backward()
try! modelFloat.update()

resultsFloat = getGradients(model: modelFloat)

let resultsFloat16: [Double]
GrAI.Precision.float16 = true

(inputs, batchSize) = setData(inputs, modelFloat16)
modelFloat16.updateKernel(batchSize: batchSize)
try! modelFloat16.forward()

gt = setLoss(gt, modelFloat16)
try! modelFloat16.backward()
try! modelFloat16.update()

resultsFloat16 = getGradients(model: modelFloat16)

if let gradDiff = checkFlow(resultsFloat, resultsFloat16)
{
if gradDiff.isNaN
{
fatalError("NaN")
}
try validate(gradDiff)
}

modelFloat.incStep()
modelFloat16.incStep()
numLoop += 1
}
epoch += 1
}
}
}

/// Compares gradients of weights computed in the CPU execution context againt the GPU one
/// after a call to the reset API.
open class FlowResetTrainer: FlowTrainer
Expand Down Expand Up @@ -831,18 +978,18 @@ open class TransformTrainer: FlowTrainer
// 5. Compare results.

let diffCPU =
(lossCPUNew - lossCPURef) * (lossCPUNew - lossCPURef) /
(lossCPUNew * lossCPUNew + lossCPURef * lossCPURef)
(lossCPUNew - lossCPURef) * (lossCPUNew - lossCPURef) /
(lossCPUNew * lossCPUNew + lossCPURef * lossCPURef)
let diffGPU =
(lossGPUNew - lossGPURef) * (lossGPUNew - lossGPURef) /
(lossGPUNew * lossGPUNew + lossGPURef * lossGPURef)
(lossGPUNew - lossGPURef) * (lossGPUNew - lossGPURef) /
(lossGPUNew * lossGPUNew + lossGPURef * lossGPURef)

var warning = ""
let maxDiff = max(diffCPU, diffGPU)
let maxIndex = diffCPU < diffGPU ? "GPU" : "CPU"
if diffCPU > 0.0000001
{
warning = "Load Check Warning " + maxIndex + " : "
warning = "Transform Check Warning " + maxIndex + " : "
}
let strDump = warning + String(maxDiff)
print(strDump)
Expand Down
Loading
Loading