owkin · jean-francoisreboud · Sep 1, 2024 · Sep 18, 2023 · Oct 7, 2023 · Dec 2, 2023
@@ -4,6 +4,40 @@ All notable changes to this project will be documented in this file.
 
 ## [unreleased]
 
+## 0.4.0 (2024-09-01)
+
+### Features 
+
+🚀 **examples:** integrate Gemma2-2B ([#132](https://github.com/owkin/GrAIdient/pull/132))\
+✨ **layer_seq:** LLM sliding window ([#131](https://github.com/owkin/GrAIdient/pull/131))\
+🚀 **examples:** 3 LLMs examples ([#130](https://github.com/owkin/GrAIdient/pull/130))\
+✨ **layer_seq:** LLM generate ([128](https://github.com/owkin/GrAIdient/pull/128))\
+✨ **layer_seq:** MultiplySeq, SiLU & LLM test ([127](https://github.com/owkin/GrAIdient/pull/127))\
+✨ **layer_seq:** ValueCausalSeq ([126](https://github.com/owkin/GrAIdient/pull/126))\
+✨ **layer_seq:** QueryCausalSeq ([125](https://github.com/owkin/GrAIdient/pull/125))\
+✨ **layer_seq:** RoPESeq ([124](https://github.com/owkin/GrAIdient/pull/124))\
+✨ **layer_seq:** RMSNormSeq ([123](https://github.com/owkin/GrAIdient/pull/123))\
+✨ **layer_seq:** EmbeddingSeq ([122](https://github.com/owkin/GrAIdient/pull/122))\
+🪜 **feat:** LayerCAM2D -> VQGrad2D, LayerCAMSeq -> VQGradSeq ([#117](https://github.com/owkin/GrAIdient/pull/117))\
+⚙️ **core:** GELU vs GELUApprox ([113](https://github.com/owkin/GrAIdient/pull/113))\
+🚀 **perf:** QuerySelf & ValueSelf ([112](https://github.com/owkin/GrAIdient/pull/112))\
+🚀 **perf:** benchmark ViT base model ([111](https://github.com/owkin/GrAIdient/pull/111))\
+⚙️ **core:** initForward,Backward model API ([109](https://github.com/owkin/GrAIdient/pull/109))\
+🪜 **layer_1d:** Dropout1D ([#108](https://github.com/owkin/GrAIdient/pull/108))\
+🪜 **feat:** VQGrad, VQGradSeq ([#107](https://github.com/owkin/GrAIdient/pull/107))
+
+### Bug Fixes
+
+🐛 **fix:** run on Apple Silicon ([110](https://github.com/owkin/GrAIdient/pull/110))
+
+### Miscellaneous Tasks
+
+📚 **docs:** LLM doc & split tests ([129](https://github.com/owkin/GrAIdient/pull/129))\
+🚀 **perf:** use half in Metal kernels ([121](https://github.com/owkin/GrAIdient/pull/121))\
+🔨 **refactor:** handle float16 along float on GPU ([#120](https://github.com/owkin/GrAIdient/pull/120))\
+🚀 **perf:** copy & generate weights faster ([119](https://github.com/owkin/GrAIdient/pull/119))\
+🚀 **perf:** Convolution2D ([118](https://github.com/owkin/GrAIdient/pull/118))
+
 ## 0.3.1 (2023-08-09)
 
 ### Bug Fixes

@@ -248,13 +248,14 @@ containing the commits to merge into the `main` branch.
       Do not delete the "Unreleased" section title: future PRs will insert 
       changelog items in this section.
     - Commit and push the changes.
-    - Squash and merge the new branch into `release_N`.
+    - Squash and merge the new branch into `release_N` with title \
+      🔧 chore: update changelog
 
 1. Create a Pull Request for `release_N` targeting the `main` branch.
 
 1. Review and Merge the Pull Request, change the commit 
    message \
-   🔧 chore: release X.Y.Z
+   🚀 Release X.Y.Z
 
 1. Create a GitHub release X.Y.Z from `main`: 
      - GitHub > Releases > Draft new Release

@@ -64,7 +64,19 @@ conda env remove --name graiexamples
 
 ## Steps
 
-1. Dump the training dataset.  
+Each train example uses a `CIFARAutoEncoderTrainer`. 
+The latter is responsible for initializing the training dataset 
+before the actual training takes place.
+
 1. Train a simple auto encoder model.
 1. Train a UNet like auto encoder model.
 1. Train a StyleGAN like auto encoder model.
+
+## Further tests
+
+Further tests are available at 
+[AutoEncoderTests](../../Tests/GrAIExamples/AutoEncoderTests.swift).
+
+The test `testTrain` compares the training of a `SimpleAutoEncoder` 
+in GrAIdient and in PyTorch to show that the same `loss` is computed 
+throughout the training.
@@ -12,3 +12,4 @@ The following examples are currently available:
 - [VGG](VGG.md)
 - [Vision Transformer](VisionTransformer.md)
 - [Auto Encoder](AutoEncoder.md)
+- [LLM](LLM.md)
@@ -0,0 +1,64 @@
+#  🚀 LLM Example
+
+This is the documentation for running 
+[LLMs](../../Tests/GrAIExamples/LLMExample.swift) on the GPU.
+
+## Setup
+
+This example has some `Python` dependencies. In order to run 
+the example, we first have to setup the environment: 
+
+```bash
+conda create --name graiexamples python=3.9
+conda activate graiexamples
+cd Tests/GrAIExamples/Base
+pip install -e .
+```
+
+Then: 
+- Download weights from 
+[MistralAI](https://docs.mistral.ai/getting-started/open_weight_models/) 
+(mistral-7B-Instruct-v0.3)
+and / or
+[Llama](https://llama.meta.com/llama-downloads/) 
+(llama-2-7b-chat or Meta-Llama-3-8B-Instruct) 
+and / or Gemma2 from [HuggingFace](https://huggingface.co/google/gemma-2-2b-it) 
+(Gemma-2-2b-it).
+- Update `_modelPathMistral`, `_modelPathLlama2`, `_modelPathLlama3`, 
+`_modelPathGemma2` in the 
+[LLMExample](../../Tests/GrAIExamples/LLMExample.swift) file with the 
+previous downloaded weights. 
+- Optionnally update `_prompt`.
+- Rename `_testGenerateMistral`, `_testGenerateLlama2`, `_testGenerateLlama3` 
+and `_testGenerateGemma2`
+into 
+`testGenerateMistral`, `testGenerateLlama2`, `testGenerateLlama3` and 
+`testGenerateGemma2`. 
+- Run the tests.
+
+It is finally possible to clean the environment 🌍
+
+```bash     
+conda deactivate
+conda env remove --name graiexamples
+```
+
+## Steps
+
+1. Generate text from a prompt with Mistral 7B Instruct model.
+1. Generate text from a prompt with Llama 2 7B Chat model.
+1. Generate text from a prompt with Llama 3 8B Instruct model.  
+1. Generata text from a prompt with Gemme 2 2B Instruct model.
+
+## Further tests
+
+Further tests are available at 
+[LLMExampleTests](../../Tests/GrAIExamples/LLMExampleTests.swift). 
+In order to run them, rename 
+`_testPredict1` and `_testPredict32` into `testPredict1` and `testPredict32`.
+
+The test `testPredict1` compares the first step of generation 
+of a toy LLM (just one transformer block) in GrAIdient and in PyTorch.   
+
+The test `testPredict32` runs the first step of generation 
+of a full LLM in GrAIdient and compares the expected result from PyTorch.
@@ -91,3 +91,17 @@ conda env remove --name graiexamples
 1. Train a model on the training dataset.
 1. Evaluate the trained model on the testing dataset: 
    watch a better performance.
+
+## Benchmarks
+
+To benchmark the time performance of the VGG model, look at 
+[VGGBenchmark](../../Tests/GrAIExamples/VGGBenchmark.swift) and rename 
+`_test_TrainVGG` and `_test_EvalVGG` into `test_TrainVGG` and `test_EvalVGG`.
+
+The test `test_TrainVGG` will measure the time spent for training the VGG 
+model for 20 steps.
+
+The test `test_EvalVGG` will measure the time spent for running the VGG model 
+in inference for 20 steps.
+
+Note that for both tests, the data is random and fixed once and for all.
@@ -86,3 +86,20 @@ conda env remove --name graiexamples
 
 1. Dump the training dataset.  
 1. Train a simple Vision Transformer model.
+
+## Benchmarks
+
+To benchmark the time performance of the Vision Transformer model, 
+look at 
+[TransformerBenchmark](../../Tests/GrAIExamples/TransformerBenchmark.swift) 
+and rename 
+`_test_TrainTransformer` and `_test_EvalTransformer` into 
+`test_TrainTransformer` and `test_EvalTransformer`.
+
+The test `test_TrainTransformer` will measure the time spent for training the 
+VisionTransformer model for 20 steps.
+
+The test `test_EvalTransformer` will measure the time spent for running the 
+VisionTransformer model in inference for 20 steps.
+
+Note that for both tests, the data is random and fixed once and for all.
@@ -7,7 +7,7 @@ import PackageDescription
 let package = Package(
     name: "GrAIdient",
     platforms: [
-        .macOS(.v10_15)
+        .macOS(.v13)
     ],
     products: [
         .library(

@@ -69,7 +69,7 @@ extension TestError: CustomStringConvertible
 ///
 /// - Parameter model: The model on which to select the initialization scheme.
 ///
-func randomSelectWeightsInitializationScheme(model: Model)
+public func randomSelectWeightsInitializationScheme(model: Model)
 {
     let choice = Int.random(in: 0...4)
     switch choice {
@@ -365,6 +365,153 @@ open class FlowTrainer: Trainer
     }
 }
 
+/// Pipeline that compares gradients of weights computed in the CPU execution context againt the GPU one.
+open class FlowPrecisionTrainer: Trainer
+{
+    ///
+    /// The two models:
+    /// [model to execute with Float precision, same model to execute with Float16 precision].
+    ///
+    public var models: [Model] = []
+
+    /// Get the model to execute with Float precision.
+    public var modelFloat: Model
+    {
+        get {
+            return models[0]
+        }
+    }
+    /// Get the model to execute with Float16 precision.
+    public var modelFloat16: Model
+    {
+        get {
+            return models[1]
+        }
+    }
+
+    ///
+    /// Create a model in the two execution contexts: CPU and GPU.
+    ///
+    /// - Parameter buildFct: A Function that creates the different layers of the models.
+    ///
+    public func build(_ buildFct: (ModelContext)->())
+    {
+        var baseModels = [BaseModel]()
+
+        let context = ModelContext(name: modelName + "Float", curID: 0)
+        buildFct(context)
+        baseModels.append(context.model)
+
+        context.model = BaseModel(name: modelName + "Float16")
+        buildFct(context)
+        baseModels.append(context.model)
+
+        var models = [Model]()
+        for baseModel in baseModels
+        {
+            models.append(Model(model: baseModel, modelsPrev: []))
+        }
+        self.models = models
+    }
+
+    /// Initialize the kernel of the models.
+    public func initialize()
+    {
+        for i in 0...1
+        {
+            if i == 0
+            {
+                GrAI.Precision.float = true
+                randomSelectWeightsInitializationScheme(model: modelFloat)
+            }
+
+            if i > 0
+            {
+                models[i].weights = models[i-1].weights
+            }
+
+            if i == 1
+            {
+                GrAI.Precision.float16 = true
+            }
+
+            models[i].initialize(
+                params: optimizerParams,
+                phase: .Training,
+                deviceID: DEVICE_ID
+            )
+        }
+    }
+
+    ///
+    /// Run the test.
+    ///
+    /// The goal is to compare the gradients of weights computed with Float precision with
+    /// the gradients of weights computed with Float16 precision.
+    ///
+    /// - Parameters:
+    ///     - setData: A function to create/set data to the model.
+    ///     - setLoss: A function to create/set ground truth to the model.
+    ///     - validate: A function that checks whether the relative difference is small enough.
+    ///
+    public func run<DataT, LossT>(
+        setData: (DataT?, Model)->(DataT, Int),
+        setLoss: (LossT?, Model)->(LossT),
+        validate: (Double) throws -> ()) throws
+    {
+        initialize()
+
+        var epoch = 0
+        let nbEpochsMax = 1
+        while epoch < nbEpochsMax
+        {
+            var numLoop = 0
+            while numLoop < optimizerParams.nbLoops
+            {
+                let resultsFloat: [Double]
+                GrAI.Precision.float = true
+
+                var (inputs, batchSize) = setData(nil, modelFloat)
+                modelFloat.updateKernel(batchSize: batchSize)
+                try! modelFloat.forward()
+
+                var gt = setLoss(nil, modelFloat)
+                try! modelFloat.backward()
+                try! modelFloat.update()
+
+                resultsFloat = getGradients(model: modelFloat)
+
+                let resultsFloat16: [Double]
+                GrAI.Precision.float16 = true
+
+                (inputs, batchSize) = setData(inputs, modelFloat16)
+                modelFloat16.updateKernel(batchSize: batchSize)
+                try! modelFloat16.forward()
+
+                gt = setLoss(gt, modelFloat16)
+                try! modelFloat16.backward()
+                try! modelFloat16.update()
+
+                resultsFloat16 = getGradients(model: modelFloat16)
+
+                if let gradDiff = checkFlow(resultsFloat, resultsFloat16)
+                {
+                    if gradDiff.isNaN
+                    {
+                        fatalError("NaN")
+                    }
+                    try validate(gradDiff)
+                }
+
+                modelFloat.incStep()
+                modelFloat16.incStep()
+                numLoop += 1
+            }
+            epoch += 1
+        }
+    }
+}
+
 /// Compares gradients of weights computed in the CPU execution context againt the GPU one
 /// after a call to the reset API.
 open class FlowResetTrainer: FlowTrainer
@@ -831,18 +978,18 @@ open class TransformTrainer: FlowTrainer
             // 5. Compare results.
 
             let diffCPU =
-            (lossCPUNew - lossCPURef) * (lossCPUNew - lossCPURef) /
-            (lossCPUNew * lossCPUNew + lossCPURef * lossCPURef)
+                (lossCPUNew - lossCPURef) * (lossCPUNew - lossCPURef) /
+                (lossCPUNew * lossCPUNew + lossCPURef * lossCPURef)
             let diffGPU =
-            (lossGPUNew - lossGPURef) * (lossGPUNew - lossGPURef) /
-            (lossGPUNew * lossGPUNew + lossGPURef * lossGPURef)
+                (lossGPUNew - lossGPURef) * (lossGPUNew - lossGPURef) /
+                (lossGPUNew * lossGPUNew + lossGPURef * lossGPURef)
 
             var warning = ""
             let maxDiff = max(diffCPU, diffGPU)
             let maxIndex = diffCPU < diffGPU ? "GPU" : "CPU"
             if diffCPU > 0.0000001
             {
-                warning = "Load Check Warning " + maxIndex + " : "
+                warning = "Transform Check Warning " + maxIndex + " : "
             }
             let strDump = warning + String(maxDiff)
             print(strDump)