diff --git a/CHANGELOG.md b/CHANGELOG.md index 4564bb16..177946e9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,40 @@ All notable changes to this project will be documented in this file. ## [unreleased] +## 0.4.0 (2024-09-01) + +### Features + +🚀 **examples:** integrate Gemma2-2B ([#132](https://github.com/owkin/GrAIdient/pull/132))\ +✨ **layer_seq:** LLM sliding window ([#131](https://github.com/owkin/GrAIdient/pull/131))\ +🚀 **examples:** 3 LLMs examples ([#130](https://github.com/owkin/GrAIdient/pull/130))\ +✨ **layer_seq:** LLM generate ([128](https://github.com/owkin/GrAIdient/pull/128))\ +✨ **layer_seq:** MultiplySeq, SiLU & LLM test ([127](https://github.com/owkin/GrAIdient/pull/127))\ +✨ **layer_seq:** ValueCausalSeq ([126](https://github.com/owkin/GrAIdient/pull/126))\ +✨ **layer_seq:** QueryCausalSeq ([125](https://github.com/owkin/GrAIdient/pull/125))\ +✨ **layer_seq:** RoPESeq ([124](https://github.com/owkin/GrAIdient/pull/124))\ +✨ **layer_seq:** RMSNormSeq ([123](https://github.com/owkin/GrAIdient/pull/123))\ +✨ **layer_seq:** EmbeddingSeq ([122](https://github.com/owkin/GrAIdient/pull/122))\ +🪜 **feat:** LayerCAM2D -> VQGrad2D, LayerCAMSeq -> VQGradSeq ([#117](https://github.com/owkin/GrAIdient/pull/117))\ +⚙️ **core:** GELU vs GELUApprox ([113](https://github.com/owkin/GrAIdient/pull/113))\ +🚀 **perf:** QuerySelf & ValueSelf ([112](https://github.com/owkin/GrAIdient/pull/112))\ +🚀 **perf:** benchmark ViT base model ([111](https://github.com/owkin/GrAIdient/pull/111))\ +⚙️ **core:** initForward,Backward model API ([109](https://github.com/owkin/GrAIdient/pull/109))\ +🪜 **layer_1d:** Dropout1D ([#108](https://github.com/owkin/GrAIdient/pull/108))\ +🪜 **feat:** VQGrad, VQGradSeq ([#107](https://github.com/owkin/GrAIdient/pull/107)) + +### Bug Fixes + +🐛 **fix:** run on Apple Silicon ([110](https://github.com/owkin/GrAIdient/pull/110)) + +### Miscellaneous Tasks + +📚 **docs:** LLM doc & split tests ([129](https://github.com/owkin/GrAIdient/pull/129))\ +🚀 **perf:** use half in Metal kernels ([121](https://github.com/owkin/GrAIdient/pull/121))\ +🔨 **refactor:** handle float16 along float on GPU ([#120](https://github.com/owkin/GrAIdient/pull/120))\ +🚀 **perf:** copy & generate weights faster ([119](https://github.com/owkin/GrAIdient/pull/119))\ +🚀 **perf:** Convolution2D ([118](https://github.com/owkin/GrAIdient/pull/118)) + ## 0.3.1 (2023-08-09) ### Bug Fixes diff --git a/Docs/Contributing/CONTRIBUTING.md b/Docs/Contributing/CONTRIBUTING.md index 433dd312..17a1abb5 100644 --- a/Docs/Contributing/CONTRIBUTING.md +++ b/Docs/Contributing/CONTRIBUTING.md @@ -248,13 +248,14 @@ containing the commits to merge into the `main` branch. Do not delete the "Unreleased" section title: future PRs will insert changelog items in this section. - Commit and push the changes. - - Squash and merge the new branch into `release_N`. + - Squash and merge the new branch into `release_N` with title \ + 🔧 chore: update changelog 1. Create a Pull Request for `release_N` targeting the `main` branch. 1. Review and Merge the Pull Request, change the commit message \ - 🔧 chore: release X.Y.Z + 🚀 Release X.Y.Z 1. Create a GitHub release X.Y.Z from `main`: - GitHub > Releases > Draft new Release diff --git a/Docs/Examples/AutoEncoder.md b/Docs/Examples/AutoEncoder.md index eb9b1451..aef3a7c3 100644 --- a/Docs/Examples/AutoEncoder.md +++ b/Docs/Examples/AutoEncoder.md @@ -64,7 +64,19 @@ conda env remove --name graiexamples ## Steps -1. Dump the training dataset. +Each train example uses a `CIFARAutoEncoderTrainer`. +The latter is responsible for initializing the training dataset +before the actual training takes place. + 1. Train a simple auto encoder model. 1. Train a UNet like auto encoder model. 1. Train a StyleGAN like auto encoder model. + +## Further tests + +Further tests are available at +[AutoEncoderTests](../../Tests/GrAIExamples/AutoEncoderTests.swift). + +The test `testTrain` compares the training of a `SimpleAutoEncoder` +in GrAIdient and in PyTorch to show that the same `loss` is computed +throughout the training. diff --git a/Docs/Examples/EXAMPLES.md b/Docs/Examples/EXAMPLES.md index 21f388b8..8776ff2b 100644 --- a/Docs/Examples/EXAMPLES.md +++ b/Docs/Examples/EXAMPLES.md @@ -12,3 +12,4 @@ The following examples are currently available: - [VGG](VGG.md) - [Vision Transformer](VisionTransformer.md) - [Auto Encoder](AutoEncoder.md) +- [LLM](LLM.md) diff --git a/Docs/Examples/LLM.md b/Docs/Examples/LLM.md new file mode 100644 index 00000000..0af3e0ee --- /dev/null +++ b/Docs/Examples/LLM.md @@ -0,0 +1,64 @@ +# 🚀 LLM Example + +This is the documentation for running +[LLMs](../../Tests/GrAIExamples/LLMExample.swift) on the GPU. + +## Setup + +This example has some `Python` dependencies. In order to run +the example, we first have to setup the environment: + +```bash +conda create --name graiexamples python=3.9 +conda activate graiexamples +cd Tests/GrAIExamples/Base +pip install -e . +``` + +Then: +- Download weights from +[MistralAI](https://docs.mistral.ai/getting-started/open_weight_models/) +(mistral-7B-Instruct-v0.3) +and / or +[Llama](https://llama.meta.com/llama-downloads/) +(llama-2-7b-chat or Meta-Llama-3-8B-Instruct) +and / or Gemma2 from [HuggingFace](https://huggingface.co/google/gemma-2-2b-it) +(Gemma-2-2b-it). +- Update `_modelPathMistral`, `_modelPathLlama2`, `_modelPathLlama3`, +`_modelPathGemma2` in the +[LLMExample](../../Tests/GrAIExamples/LLMExample.swift) file with the +previous downloaded weights. +- Optionnally update `_prompt`. +- Rename `_testGenerateMistral`, `_testGenerateLlama2`, `_testGenerateLlama3` +and `_testGenerateGemma2` +into +`testGenerateMistral`, `testGenerateLlama2`, `testGenerateLlama3` and +`testGenerateGemma2`. +- Run the tests. + +It is finally possible to clean the environment 🌍 + +```bash +conda deactivate +conda env remove --name graiexamples +``` + +## Steps + +1. Generate text from a prompt with Mistral 7B Instruct model. +1. Generate text from a prompt with Llama 2 7B Chat model. +1. Generate text from a prompt with Llama 3 8B Instruct model. +1. Generata text from a prompt with Gemme 2 2B Instruct model. + +## Further tests + +Further tests are available at +[LLMExampleTests](../../Tests/GrAIExamples/LLMExampleTests.swift). +In order to run them, rename +`_testPredict1` and `_testPredict32` into `testPredict1` and `testPredict32`. + +The test `testPredict1` compares the first step of generation +of a toy LLM (just one transformer block) in GrAIdient and in PyTorch. + +The test `testPredict32` runs the first step of generation +of a full LLM in GrAIdient and compares the expected result from PyTorch. diff --git a/Docs/Examples/VGG.md b/Docs/Examples/VGG.md index 40f3db74..9f34de73 100644 --- a/Docs/Examples/VGG.md +++ b/Docs/Examples/VGG.md @@ -91,3 +91,17 @@ conda env remove --name graiexamples 1. Train a model on the training dataset. 1. Evaluate the trained model on the testing dataset: watch a better performance. + +## Benchmarks + +To benchmark the time performance of the VGG model, look at +[VGGBenchmark](../../Tests/GrAIExamples/VGGBenchmark.swift) and rename +`_test_TrainVGG` and `_test_EvalVGG` into `test_TrainVGG` and `test_EvalVGG`. + +The test `test_TrainVGG` will measure the time spent for training the VGG +model for 20 steps. + +The test `test_EvalVGG` will measure the time spent for running the VGG model +in inference for 20 steps. + +Note that for both tests, the data is random and fixed once and for all. diff --git a/Docs/Examples/VisionTransformer.md b/Docs/Examples/VisionTransformer.md index 6dfdf405..b347e7aa 100644 --- a/Docs/Examples/VisionTransformer.md +++ b/Docs/Examples/VisionTransformer.md @@ -86,3 +86,20 @@ conda env remove --name graiexamples 1. Dump the training dataset. 1. Train a simple Vision Transformer model. + +## Benchmarks + +To benchmark the time performance of the Vision Transformer model, +look at +[TransformerBenchmark](../../Tests/GrAIExamples/TransformerBenchmark.swift) +and rename +`_test_TrainTransformer` and `_test_EvalTransformer` into +`test_TrainTransformer` and `test_EvalTransformer`. + +The test `test_TrainTransformer` will measure the time spent for training the +VisionTransformer model for 20 steps. + +The test `test_EvalTransformer` will measure the time spent for running the +VisionTransformer model in inference for 20 steps. + +Note that for both tests, the data is random and fixed once and for all. diff --git a/Package.swift b/Package.swift index 8cc64efb..a386a0a9 100644 --- a/Package.swift +++ b/Package.swift @@ -7,7 +7,7 @@ import PackageDescription let package = Package( name: "GrAIdient", platforms: [ - .macOS(.v10_15) + .macOS(.v13) ], products: [ .library( diff --git a/Sources/GrAITestsUtils/Trainer.swift b/Sources/GrAITestsUtils/Trainer.swift index 74a85820..13a076c7 100644 --- a/Sources/GrAITestsUtils/Trainer.swift +++ b/Sources/GrAITestsUtils/Trainer.swift @@ -69,7 +69,7 @@ extension TestError: CustomStringConvertible /// /// - Parameter model: The model on which to select the initialization scheme. /// -func randomSelectWeightsInitializationScheme(model: Model) +public func randomSelectWeightsInitializationScheme(model: Model) { let choice = Int.random(in: 0...4) switch choice { @@ -365,6 +365,153 @@ open class FlowTrainer: Trainer } } +/// Pipeline that compares gradients of weights computed in the CPU execution context againt the GPU one. +open class FlowPrecisionTrainer: Trainer +{ + /// + /// The two models: + /// [model to execute with Float precision, same model to execute with Float16 precision]. + /// + public var models: [Model] = [] + + /// Get the model to execute with Float precision. + public var modelFloat: Model + { + get { + return models[0] + } + } + /// Get the model to execute with Float16 precision. + public var modelFloat16: Model + { + get { + return models[1] + } + } + + /// + /// Create a model in the two execution contexts: CPU and GPU. + /// + /// - Parameter buildFct: A Function that creates the different layers of the models. + /// + public func build(_ buildFct: (ModelContext)->()) + { + var baseModels = [BaseModel]() + + let context = ModelContext(name: modelName + "Float", curID: 0) + buildFct(context) + baseModels.append(context.model) + + context.model = BaseModel(name: modelName + "Float16") + buildFct(context) + baseModels.append(context.model) + + var models = [Model]() + for baseModel in baseModels + { + models.append(Model(model: baseModel, modelsPrev: [])) + } + self.models = models + } + + /// Initialize the kernel of the models. + public func initialize() + { + for i in 0...1 + { + if i == 0 + { + GrAI.Precision.float = true + randomSelectWeightsInitializationScheme(model: modelFloat) + } + + if i > 0 + { + models[i].weights = models[i-1].weights + } + + if i == 1 + { + GrAI.Precision.float16 = true + } + + models[i].initialize( + params: optimizerParams, + phase: .Training, + deviceID: DEVICE_ID + ) + } + } + + /// + /// Run the test. + /// + /// The goal is to compare the gradients of weights computed with Float precision with + /// the gradients of weights computed with Float16 precision. + /// + /// - Parameters: + /// - setData: A function to create/set data to the model. + /// - setLoss: A function to create/set ground truth to the model. + /// - validate: A function that checks whether the relative difference is small enough. + /// + public func run( + setData: (DataT?, Model)->(DataT, Int), + setLoss: (LossT?, Model)->(LossT), + validate: (Double) throws -> ()) throws + { + initialize() + + var epoch = 0 + let nbEpochsMax = 1 + while epoch < nbEpochsMax + { + var numLoop = 0 + while numLoop < optimizerParams.nbLoops + { + let resultsFloat: [Double] + GrAI.Precision.float = true + + var (inputs, batchSize) = setData(nil, modelFloat) + modelFloat.updateKernel(batchSize: batchSize) + try! modelFloat.forward() + + var gt = setLoss(nil, modelFloat) + try! modelFloat.backward() + try! modelFloat.update() + + resultsFloat = getGradients(model: modelFloat) + + let resultsFloat16: [Double] + GrAI.Precision.float16 = true + + (inputs, batchSize) = setData(inputs, modelFloat16) + modelFloat16.updateKernel(batchSize: batchSize) + try! modelFloat16.forward() + + gt = setLoss(gt, modelFloat16) + try! modelFloat16.backward() + try! modelFloat16.update() + + resultsFloat16 = getGradients(model: modelFloat16) + + if let gradDiff = checkFlow(resultsFloat, resultsFloat16) + { + if gradDiff.isNaN + { + fatalError("NaN") + } + try validate(gradDiff) + } + + modelFloat.incStep() + modelFloat16.incStep() + numLoop += 1 + } + epoch += 1 + } + } +} + /// Compares gradients of weights computed in the CPU execution context againt the GPU one /// after a call to the reset API. open class FlowResetTrainer: FlowTrainer @@ -831,18 +978,18 @@ open class TransformTrainer: FlowTrainer // 5. Compare results. let diffCPU = - (lossCPUNew - lossCPURef) * (lossCPUNew - lossCPURef) / - (lossCPUNew * lossCPUNew + lossCPURef * lossCPURef) + (lossCPUNew - lossCPURef) * (lossCPUNew - lossCPURef) / + (lossCPUNew * lossCPUNew + lossCPURef * lossCPURef) let diffGPU = - (lossGPUNew - lossGPURef) * (lossGPUNew - lossGPURef) / - (lossGPUNew * lossGPUNew + lossGPURef * lossGPURef) + (lossGPUNew - lossGPURef) * (lossGPUNew - lossGPURef) / + (lossGPUNew * lossGPUNew + lossGPURef * lossGPURef) var warning = "" let maxDiff = max(diffCPU, diffGPU) let maxIndex = diffCPU < diffGPU ? "GPU" : "CPU" if diffCPU > 0.0000001 { - warning = "Load Check Warning " + maxIndex + " : " + warning = "Transform Check Warning " + maxIndex + " : " } let strDump = warning + String(maxDiff) print(strDump) diff --git a/Sources/GrAIdient/Core/Function/Activation.swift b/Sources/GrAIdient/Core/Function/Activation.swift index 6171a184..50e7209e 100644 --- a/Sources/GrAIdient/Core/Function/Activation.swift +++ b/Sources/GrAIdient/Core/Function/Activation.swift @@ -14,6 +14,8 @@ let ACTIVATION_REGISTRY: [String: Codable.Type] = buildRegistry( LeakyReLU.self, SoftReLU.self, Sigmoid.self, + SiLU.self, + GELUApprox.self, GELU.self ]) @@ -305,21 +307,46 @@ open class ActivationFunction: Codable /// - tmp: Buffer containing forward values before activation. /// - outs: Buffer containing forward values after activation. /// - deviceID: GPU device where to execute the operation. + /// - phase: Running phase: Training or Inference. /// private func _forwardGPU( - tmp: MetalBuffer, - outs: MetalBuffer, - deviceID: Int) + tmp: inout FloatBuffer?, + outs: FloatBuffer, + deviceID: Int, + phase: Phase?) { let nbElems = outs.nbElems + let backward = phase != nil && + (phase == .Training || phase == .InferenceBackward) + + if backward && tmp == nil + { + tmp = FloatBuffer( + nbElems: nbElems, deviceID: deviceID + ) + } + let pNbElems: [UInt32] = [UInt32(nbElems)] + var kernel = forwardKernel + if !backward + { + kernel += "Inference" + } let command = MetalKernel.get.createCommand( - forwardKernel, deviceID: deviceID + kernel, deviceID: deviceID ) + command.setBytes(pNbElems, atIndex: 0) - command.setBuffer(tmp.metal, atIndex: 1) - command.setBuffer(outs.metal, atIndex: 2) + if backward + { + command.setBuffer(tmp!.metal, atIndex: 1) + command.setBuffer(outs.metal, atIndex: 2) + } + else + { + command.setBuffer(outs.metal, atIndex: 1) + } command.dispatchThreads(nbElems) command.enqueue() @@ -332,16 +359,11 @@ open class ActivationFunction: Codable /// open func forwardGPU(_ layer: Activation1D) { - let nbElems = layer.outs.nbElems - if layer._tmp == nil - { - layer._tmp = MetalPrivateBuffer( - nbElems, deviceID: layer.deviceID) - } _forwardGPU( - tmp: layer._tmp, + tmp: &layer.tmp, outs: layer.outs, - deviceID: layer.deviceID + deviceID: layer.deviceID, + phase: layer.phase ) } @@ -352,16 +374,11 @@ open class ActivationFunction: Codable /// open func forwardGPU(_ layer: Activation2D) { - let nbElems = layer.outs.nbElems - if layer._tmp == nil - { - layer._tmp = MetalPrivateBuffer( - nbElems, deviceID: layer.deviceID) - } _forwardGPU( - tmp: layer._tmp, + tmp: &layer.tmp, outs: layer.outs, - deviceID: layer.deviceID + deviceID: layer.deviceID, + phase: layer.phase ) } @@ -372,16 +389,11 @@ open class ActivationFunction: Codable /// open func forwardGPU(_ layer: ActivationSeq) { - let nbElems = layer.outs.nbElems - if layer._tmp == nil - { - layer._tmp = MetalPrivateBuffer( - nbElems, deviceID: layer.deviceID) - } _forwardGPU( - tmp: layer._tmp, + tmp: &layer.tmp, outs: layer.outs, - deviceID: layer.deviceID + deviceID: layer.deviceID, + phase: layer.phase ) } @@ -394,8 +406,8 @@ open class ActivationFunction: Codable /// - deviceID: GPU device where to execute the operation. /// private func _backwardGPU( - tmp: MetalBuffer, - delta: MetalBuffer, + tmp: FloatBuffer, + delta: FloatBuffer, deviceID: Int) { let nbElems = delta.nbElems @@ -420,7 +432,7 @@ open class ActivationFunction: Codable open func backwardGPU(_ layer: Activation1D) { _backwardGPU( - tmp: layer._tmp, + tmp: layer.tmp, delta: layer.delta, deviceID: layer.deviceID ) @@ -434,7 +446,7 @@ open class ActivationFunction: Codable open func backwardGPU(_ layer: Activation2D) { _backwardGPU( - tmp: layer._tmp, + tmp: layer.tmp, delta: layer.delta, deviceID: layer.deviceID ) @@ -448,7 +460,7 @@ open class ActivationFunction: Codable open func backwardGPU(_ layer: ActivationSeq) { _backwardGPU( - tmp: layer._tmp, + tmp: layer.tmp, delta: layer.delta, deviceID: layer.deviceID ) @@ -767,23 +779,115 @@ public class Sigmoid: ActivationFunction } } -/// GELU activation function. -public class GELU: ActivationFunction +/// SiLU activation function. +public class SiLU: ActivationFunction { - public static let str = "GELU" + public static let str = "SiLU" /// Forward GPU kernel. public override var forwardKernel: String { get { - return "forwardGELU" + return "forwardSiLU" } } /// Backward GPU kernel. public override var backwardKernel: String { get { - return "backwardGELU" + return "backwardSiLU" + } + } + + /// Create a Sigmoid activation function. + init() + { + super.init(SiLU.str) + } + + /// + /// Decode from the disk. + /// + /// Throw an error if reading from the decoder fails, or + /// if the data read is corrupted or otherwise invalid. + /// + /// - Parameter decoder: The decoder to read data from. + /// + required public init(from decoder: Decoder) throws + { + try super.init(from: decoder) + } + + /// + /// Sigmoid function. + /// + /// - Parameter x: The input. + /// - Returns: The output. + /// + private func _sigmoid(_ x: Double) -> Double + { + if x >= 0 + { + return 1 / (1 + exp(-x)) + } + else + { + return exp(x) / (1 + exp(x)) + } + } + + /// + /// Sigmoid derivative function. + /// + /// - Parameter x: The input. + /// - Returns: The output. + /// + private func _sigmoidDer(_ x: Double) -> Double + { + let fx = _sigmoid(x) + return fx * (1 - fx) + } + + /// + /// Forward CPU. + /// + /// - Parameter x: The input. + /// - Returns: The output. + /// + public override func apply(_ x: Double) -> Double + { + return x * _sigmoid(x) + } + + /// + /// Backward CPU. + /// + /// - Parameter x: The input. + /// - Returns: The output. + /// + public override func derivate(_ x: Double) -> Double + { + return _sigmoid(x) + x * _sigmoidDer(x) + } +} + +/// GELU approximative activation function. +public class GELUApprox: ActivationFunction +{ + public static let str = "GELUApprox" + + /// Forward GPU kernel. + public override var forwardKernel: String + { + get { + return "forwardGELUApprox" + } + } + /// Backward GPU kernel. + public override var backwardKernel: String + { + get { + return "backwardGELUApprox" } } @@ -865,6 +969,83 @@ public class GELU: ActivationFunction } } +/// GELU activation function. +public class GELU: ActivationFunction +{ + public static let str = "GELU" + + /// Forward GPU kernel. + public override var forwardKernel: String + { + get { + return "forwardGELU" + } + } + /// Backward GPU kernel. + public override var backwardKernel: String + { + get { + return "backwardGELU" + } + } + + /// + /// Coefficient to apply during the weights initialization. + /// + /// - Returns: The coefficient. + /// + open override var coeffInitWeights: Float + { + get { + return Float(sqrt(2.0)) + } + } + + /// Create a GELU activation function. + init() + { + super.init(GELU.str) + } + + /// + /// Decode from the disk. + /// + /// Throw an error if reading from the decoder fails, or + /// if the data read is corrupted or otherwise invalid. + /// + /// - Parameter decoder: The decoder to read data from. + /// + required public init(from decoder: Decoder) throws + { + try super.init(from: decoder) + } + + /// + /// Forward CPU. + /// + /// - Parameter x: The input. + /// - Returns: The output. + /// + public override func apply(_ x: Double) -> Double + { + return 0.5 * x * (1 + erf(x / sqrt(2.0))) + } + + /// + /// Backward CPU. + /// + /// - Parameter x: The input. + /// - Returns: The output. + /// + public override func derivate(_ x: Double) -> Double + { + let tmp1 = 0.5 * (1.0 + erf(x / sqrt(2.0))) + let tmp2 = x / sqrt(2.0 * Double.pi) * exp(-x * x / 2.0) + let derivative = tmp1 + tmp2 + return derivative + } +} + /// Factory API to build an activation function. public protocol ActivationKernel { @@ -886,6 +1067,8 @@ class ActivationKernelImpl: ActivationKernel LeakyReLU.str: LeakyReLUKernel(), SoftReLU.str: SoftReLUKernel(), Sigmoid.str: SigmoidKernel(), + SiLU.str: SiLUKernel(), + GELUApprox.str: GELUApproxKernel(), GELU.str: GELUKernel() ] @@ -955,6 +1138,26 @@ private class SigmoidKernel: ActivationKernelImpl } /// Factory to build a Sigmoid function. +private class SiLUKernel: ActivationKernelImpl +{ + /// Build a Sigmoid function. + override func build() -> ActivationFunction + { + return SiLU() + } +} + +/// Factory to build a GELU approximative function. +private class GELUApproxKernel: ActivationKernelImpl +{ + /// Build a Sigmoid function. + override func build() -> ActivationFunction + { + return GELUApprox() + } +} + +/// Factory to build a GELU function. private class GELUKernel: ActivationKernelImpl { /// Build a Sigmoid function. diff --git a/Sources/GrAIdient/Core/Function/Normalization.swift b/Sources/GrAIdient/Core/Function/Normalization.swift index 8a5e40b8..c2a5e00c 100644 --- a/Sources/GrAIdient/Core/Function/Normalization.swift +++ b/Sources/GrAIdient/Core/Function/Normalization.swift @@ -54,6 +54,29 @@ class Normalization let outsNew = vDSP.add(β, vDSP.multiply(Ɣ, xHat)) return outsNew } + + /// + /// Forward Gradient Checking RMSNorm CPU. + /// + /// - Parameters: + /// - outs: The data to normalize. + /// - Ɣ: The weights to scale the normalization result. + /// - addUnitOffset: Whether to add unit offset or not. + /// - Returns: The data normalized. + /// + static func forwardΣGC(outs: [Double], + Ɣ: [Double], + addUnitOffset: Bool) -> [Double] + { + let σ2 = vDSP.meanSquare(outs) + let xHat = vDSP.divide(outs, sqrt(σ2 + _Ɛ)) + var outsNew = vDSP.multiply(Ɣ, xHat) + if addUnitOffset + { + outsNew = vDSP.add(xHat, outsNew) + } + return outsNew + } /// /// Forward Training CPU. @@ -118,6 +141,36 @@ class Normalization μ: μ, σ2: σ2) } + + /// + /// Forward RMSNorm CPU. + /// + /// - Parameters: + /// - outs: The data to normalize. + /// - Ɣ: The weights to scale the normalization result. + /// - addUnitOffset: Whether to add unit offset or not. + /// - Returns: (The data normalized, + /// The data normalized without taking into account the bias and the weight, + /// The deviation of the data). + /// + static func forwardΣ(outs: [Double], + Ɣ: [Double], + addUnitOffset: Bool) -> (outsNew: [Double], + xHat: [Double], + σ2: Double) + { + let σ2 = vDSP.meanSquare(outs) + let xHat = vDSP.divide(outs, sqrt(σ2 + _Ɛ)) + var outsNew = vDSP.multiply(Ɣ, xHat) + if addUnitOffset + { + outsNew = vDSP.add(xHat, outsNew) + } + + return (outsNew: outsNew, + xHat: xHat, + σ2: σ2) + } /// /// Forward Inference CPU. @@ -191,9 +244,7 @@ class Normalization /// - xHat: The data normalized without taking into account the bias and the weight. /// - σ2: The deviation of the data. /// - Ɣ: The weights that scaled the normalization result. - /// - Returns: (The gradient taking into account the normalization, - /// The gradient of β, - /// The gradient of Ɣ). + /// - Returns: The gradient taking into account the normalization. /// static func backward(delta: [Double], xHat: [Double], @@ -215,6 +266,47 @@ class Normalization return deltaNew } + + /// + /// Backward RMSNorm CPU. + /// + /// - Parameters: + /// - delta: The gradients to back propagate. + /// - xHat: The data normalized without taking into account the bias and the weight. + /// - σ2: The deviation of the data. + /// - Ɣ: The weights that scaled the normalization result. + /// - addUnitOffset: Whether to add unit offset or not. + /// - Returns: The gradient taking into account the normalization. + /// + static func backwardΣ(delta: [Double], + xHat: [Double], + σ2: Double, + Ɣ: [Double], + addUnitOffset: Bool) -> [Double] + { + let nbElems = delta.count + let factor = 1.0 / (Double(nbElems) * sqrt(σ2 + _Ɛ)) + + let Ɣdelta: [Double] + if addUnitOffset + { + Ɣdelta = vDSP.multiply(vDSP.add(1, Ɣ), delta) + } + else + { + Ɣdelta = vDSP.multiply(Ɣ, delta) + } + + let sum2 = vDSP.sum(vDSP.multiply(Ɣdelta, xHat)) + + let tmp1 = vDSP.add( + multiplication: (Ɣdelta, Double(nbElems)), + multiplication: (xHat, -sum2)) + let deltaNew = vDSP.add( + multiplication: (tmp1, factor), 0) + + return deltaNew + } /// /// Backward Inference CPU. diff --git a/Sources/GrAIdient/Core/Layer/Layer.swift b/Sources/GrAIdient/Core/Layer/Layer.swift index 34dd42f6..76e33929 100644 --- a/Sources/GrAIdient/Core/Layer/Layer.swift +++ b/Sources/GrAIdient/Core/Layer/Layer.swift @@ -58,30 +58,6 @@ public protocol LayerWithActivation: Layer func removeActivation(params: GrAI.Model.Params) -> Layer } -/// A layer that needs image size information. -public protocol LayerResize: Layer -{ - /// - /// Resize this layer. - /// - /// - Parameters: - /// - imageWidth: New size width. - /// - imageHeight: New size height. - /// - mapping: Dictionary allowing to find the layer associated to some id. - /// This dictionary is particularly useful when the different layers cannot access - /// their `layerPrev`. - /// - /// - Returns: A new layer. When `inPlace` is false, `initKernel` is - /// necessary in order to recreate hard resources. - /// - func resize( - imageWidth: Int, - imageHeight: Int, - mapping: Dictionary, - inPlace: Bool - ) -> Layer -} - /// Abstract layer of a deep learning model. open class Layer: Codable { @@ -271,6 +247,27 @@ open class Layer: Codable /// open func initKernelGPU() {} + /// + /// Initialize state resources in the CPU execution context. + /// + /// We initialize the neurons' state (forward and backward). + /// + open func checkStateCPU(batchSize: Int) throws {} + + /// + /// Initialize state resources in the GPU execution context. + /// + /// We initialize the neurons' forward state. + /// + open func checkStateForwardGPU(batchSize: Int) throws {} + + /// + /// Initialize state resources in the GPU execution context. + /// + /// We initialize the neurons' backward state. + /// + open func checkStateBackwardGPU(batchSize: Int) throws {} + /// /// Update the backward dirty flag for `layerPrev` instance. /// diff --git a/Sources/GrAIdient/Core/Layer/LayerInput.swift b/Sources/GrAIdient/Core/Layer/LayerInput.swift index c3cf7e81..d9ba95b5 100644 --- a/Sources/GrAIdient/Core/Layer/LayerInput.swift +++ b/Sources/GrAIdient/Core/Layer/LayerInput.swift @@ -105,14 +105,13 @@ class InputBuffers { /// The link to the layer. unowned let _layer: T - /// Number of elements in the different buffers. - let nbElems: Int - /// GPU device where the buffers are sent. - let deviceID: Int - var _m: MetalBuffer! = nil - var _v: MetalBuffer! = nil - var _vHat: MetalBuffer! = nil + /// Momentum buffer. + public let m: FloatBuffer + /// Velocity buffer. + public let v: FloatBuffer + /// Velocity normalized buffer. + public let vHat: FloatBuffer /// /// Create a container of buffers. @@ -127,51 +126,16 @@ class InputBuffers deviceID: Int) { _layer = layer - self.nbElems = nbElems - self.deviceID = deviceID - } - - /// Momentum buffer. - var m: MetalBuffer - { - get { - if _m == nil - { - _m = MetalPrivateBuffer(nbElems, deviceID: deviceID) - } - return _m - } - } - - /// Velocity buffer. - var v: MetalBuffer - { - get { - if _v == nil - { - _v = MetalPrivateBuffer(nbElems, deviceID: deviceID) - } - return _v - } - } - - /// Velocity normalized buffer. - var vHat: MetalBuffer - { - get { - if _vHat == nil - { - _vHat = MetalPrivateBuffer(nbElems, deviceID: deviceID) - } - return _vHat - } + m = FloatBuffer(nbElems: nbElems, deviceID: deviceID) + v = FloatBuffer(nbElems: nbElems, deviceID: deviceID) + vHat = FloatBuffer(nbElems: nbElems, deviceID: deviceID) } /// Clean the momentum..., preserving the weights. func reset() { - _m = nil - _v = nil - _vHat = nil + m.reset() + v.reset() + vHat.reset() } } diff --git a/Sources/GrAIdient/Core/Layer/LayerNormalization.swift b/Sources/GrAIdient/Core/Layer/LayerNormalization.swift index 3154be8c..62119c6d 100644 --- a/Sources/GrAIdient/Core/Layer/LayerNormalization.swift +++ b/Sources/GrAIdient/Core/Layer/LayerNormalization.swift @@ -91,6 +91,16 @@ public class LayerWeightsNormalization: Codable, Cloneable self.init(nbNeurons: layer.nbNeurons) } + /// + /// Create a layer with independent units of normalization. + /// + /// - Parameter layer: The layer with the structure we want to apply the normalization to . + /// + convenience init(_ layer: RMSNormSeq) + { + self.init(nbNeurons: layer.nbNeurons) + } + /// /// Decode from the disk. /// @@ -620,7 +630,7 @@ public class BatchNormalization: LayerWeightsStatsNormalization } /// Get the weights in the CPU execution context. - func collectWeights() -> [IWeightArrays] + func collectWeights() -> [WeightArrays] { return [_Ɣ, _β] } @@ -633,50 +643,50 @@ class BatchNormalizationGPU: LayerWeightsStatsNormalization /// Buffer of weights to scale the normalization result. /// Shape ~ (nbNeurons,). /// - var _Ɣ: IWeightBuffers! = nil + var _Ɣ: WeightBuffers! = nil /// /// Buffer of biases to add to the normalization result. /// Shape ~ (nbNeurons,). /// - var _β: IWeightBuffers! = nil + var _β: WeightBuffers! = nil /// /// Buffer of averages of data for the different independent batch normalization units. /// Shape ~ (nbNeurons,). /// - var _μ: MetalBuffer! = nil + var _μ: FloatBuffer! = nil /// /// Buffer of global averages of data for the different independent batch normalization units. /// Shape ~ (nbNeurons,). /// - var _Eμ: MetalPrivateBuffer! = nil + var _Eμ: FloatBuffer! = nil /// /// Buffer of deviations of data for the different independent batch normalization units. /// Shape ~ (nbNeurons,). /// - var _σ2: MetalBuffer! = nil + var _σ2: FloatBuffer! = nil /// /// Buffer of global deviations of data for the different independent batch normalization units. /// Shape ~ (nbNeurons,). /// - var _Eσ2: MetalPrivateBuffer! = nil + var _Eσ2: FloatBuffer! = nil /// /// Buffer of data normalized without taking into account the biases and the weights. /// Shape ~ (batch, nbNeurons, height, width). /// - var _xHat: MetalBuffer! = nil + var _xHat: FloatBuffer! = nil /// /// Buffer used to compute backward pass. /// Shape ~ (nbNeurons,). /// - var _sum1: MetalBuffer! = nil + var _sum1: FloatBuffer! = nil /// /// Buffer used to compute backward pass. /// Shape ~ (nbNeurons,). /// - var _sum2: MetalBuffer! = nil + var _sum2: FloatBuffer! = nil /// GPU device on which model is executed. var _deviceID = 0 @@ -690,11 +700,8 @@ class BatchNormalizationGPU: LayerWeightsStatsNormalization return super.weights } - MetalKernel.get.download([_β.w_p!, _Ɣ.w_p!]) - - var weightsTmp = [Float]() - weightsTmp += _Ɣ.w_p!.shared.array - weightsTmp += _β.w_p!.shared.array + var weightsTmp = _Ɣ!.w.download() + weightsTmp += _β!.w.download() return weightsTmp } set { @@ -717,11 +724,8 @@ class BatchNormalizationGPU: LayerWeightsStatsNormalization return super.stats } - MetalKernel.get.download([_Eμ, _Eσ2]) - - var statsTmp = [Float]() - statsTmp += _Eμ.shared.array - statsTmp += _Eσ2.shared.array + var statsTmp = _Eμ.download() + statsTmp += _Eσ2.download() return statsTmp } set { @@ -781,58 +785,38 @@ class BatchNormalizationGPU: LayerWeightsStatsNormalization _β = WeightBuffers(nbElems: _nbNeurons, deviceID: _deviceID) _Ɣ = WeightBuffers(nbElems: _nbNeurons, deviceID: _deviceID) - let βPtr = _β.w_p!.shared.buffer - let ƔPtr = _Ɣ.w_p!.shared.buffer - if _weightsList.count == 0 { + _weightsList = [Float](repeating: 0.0, count: 2 * _nbNeurons) for depth in 0..<_nbNeurons { - ƔPtr[depth] = 1.0 - βPtr[depth] = 0.0 - } - } - else - { - for depth in 0..<_nbNeurons - { - ƔPtr[depth] = _weightsList[depth] - βPtr[depth] = _weightsList[_nbNeurons + depth] + _weightsList[depth] = 1.0 } - _weightsList = [] } - MetalKernel.get.upload([_β.w_p!, _Ɣ.w_p!]) + _Ɣ.w.initialize(array: &_weightsList) + _β.w.initialize(array: &_weightsList, start: _nbNeurons) + + _weightsList = [] } /// Initialize stats in the GPU execution context. func initStats() { - _Eμ = MetalPrivateBuffer(_nbNeurons, deviceID: _deviceID) - _Eσ2 = MetalPrivateBuffer(_nbNeurons, deviceID: _deviceID) - - let EμPtr = _Eμ.shared.buffer - let Eσ2Ptr = _Eσ2.shared.buffer + _Eμ = FloatBuffer(nbElems: _nbNeurons, deviceID: _deviceID) + _Eσ2 = FloatBuffer(nbElems: _nbNeurons, deviceID: _deviceID) - if _statsList.count == 0 + if _statsList.count != 0 { - for depth in 0..<_nbNeurons - { - EμPtr[depth] = 0.0 - Eσ2Ptr[depth] = 0.0 - } + _Eμ.initialize(array: &_statsList) + _Eσ2.initialize(array: &_statsList, start: _nbNeurons) } else { - for depth in 0..<_nbNeurons - { - EμPtr[depth] = _statsList[depth] - Eσ2Ptr[depth] = _statsList[_nbNeurons + depth] - } - _statsList = [] + _Eμ.initialize() + _Eσ2.initialize() } - - MetalKernel.get.upload([_Eμ, _Eσ2]) + _statsList = [] } /// @@ -880,7 +864,7 @@ class BatchNormalizationGPU: LayerWeightsStatsNormalization if _μ == nil { - _μ = MetalPrivateBuffer(_nbNeurons, deviceID: _deviceID) + _μ = FloatBuffer(nbElems: _nbNeurons, deviceID: _deviceID) } let command = MetalKernel.get.createCommand( @@ -913,7 +897,7 @@ class BatchNormalizationGPU: LayerWeightsStatsNormalization if _σ2 == nil { - _σ2 = MetalPrivateBuffer(_nbNeurons, deviceID: _deviceID) + _σ2 = FloatBuffer(nbElems: _nbNeurons, deviceID: _deviceID) } let command = MetalKernel.get.createCommand( @@ -948,7 +932,7 @@ class BatchNormalizationGPU: LayerWeightsStatsNormalization if _xHat == nil { - _xHat = MetalPrivateBuffer( + _xHat = FloatBuffer(nbElems: batchSize * _nbNeurons * width * height, deviceID: _deviceID ) @@ -1039,8 +1023,8 @@ class BatchNormalizationGPU: LayerWeightsStatsNormalization if _sum1 == nil { - _sum1 = MetalPrivateBuffer(_nbNeurons, deviceID: _deviceID) - _sum2 = MetalPrivateBuffer(_nbNeurons, deviceID: _deviceID) + _sum1 = FloatBuffer(nbElems: _nbNeurons, deviceID: _deviceID) + _sum2 = FloatBuffer(nbElems: _nbNeurons, deviceID: _deviceID) } let command = MetalKernel.get.createCommand( @@ -1126,7 +1110,7 @@ class BatchNormalizationGPU: LayerWeightsStatsNormalization } /// Get the weights in the GPU execution context. - func collectWeights() -> [IWeightBuffers] + func collectWeights() -> [WeightBuffers] { return [_Ɣ, _β] } @@ -1475,7 +1459,7 @@ public class InstanceNormalization: LayerWeightsNormalization } /// Get the weights in the CPU execution context. - func collectWeights() -> [IWeightArrays] + func collectWeights() -> [WeightArrays] { return [_Ɣ, _β] } @@ -1488,40 +1472,40 @@ class InstanceNormalizationGPU: LayerWeightsNormalization /// Buffer of weights to scale the normalization result. /// Shape ~ (nbNeurons,). /// - var _Ɣ: IWeightBuffers! = nil + var _Ɣ: WeightBuffers! = nil /// /// Buffer of biases to add to the normalization result. /// Shape ~ (nbNeurons,). /// - var _β: IWeightBuffers! = nil + var _β: WeightBuffers! = nil /// /// Buffer of averages of data for the different independent batch normalization units. /// Shape ~ (batch, nbNeurons). /// - var _μ: MetalBuffer! = nil + var _μ: FloatBuffer! = nil /// /// Buffer of deviations of data for the different independent batch normalization units. /// Shape ~ (batch, nbNeurons). /// - var _σ2: MetalBuffer! = nil + var _σ2: FloatBuffer! = nil /// /// Buffer of data normalized without taking into account the biases and the weights. /// Shape ~ (batch, nbNeurons, height, width). /// - var _xHat: MetalBuffer! = nil + var _xHat: FloatBuffer! = nil /// /// Buffer used to compute backward pass. /// Shape ~ (nbNeurons,). /// - var _sum1: MetalBuffer! = nil + var _sum1: FloatBuffer! = nil /// /// Buffer used to compute backward pass. /// Shape ~ (nbNeurons,). /// - var _sum2: MetalBuffer! = nil + var _sum2: FloatBuffer! = nil /// GPU device on which model is executed. var _deviceID = 0 @@ -1535,11 +1519,8 @@ class InstanceNormalizationGPU: LayerWeightsNormalization return super.weights } - MetalKernel.get.download([_β.w_p!, _Ɣ.w_p!]) - - var weightsTmp = [Float]() - weightsTmp += _Ɣ.w_p!.shared.array - weightsTmp += _β.w_p!.shared.array + var weightsTmp = _Ɣ!.w.download() + weightsTmp += _β!.w.download() return weightsTmp } set { @@ -1597,28 +1578,19 @@ class InstanceNormalizationGPU: LayerWeightsNormalization _β = WeightBuffers(nbElems: _nbNeurons, deviceID: _deviceID) _Ɣ = WeightBuffers(nbElems: _nbNeurons, deviceID: _deviceID) - let βPtr = _β.w_p!.shared.buffer - let ƔPtr = _Ɣ.w_p!.shared.buffer - if _weightsList.count == 0 { + _weightsList = [Float](repeating: 0.0, count: 2 * _nbNeurons) for depth in 0..<_nbNeurons { - ƔPtr[depth] = 1.0 - βPtr[depth] = 0.0 - } - } - else - { - for depth in 0..<_nbNeurons - { - ƔPtr[depth] = _weightsList[depth] - βPtr[depth] = _weightsList[_nbNeurons + depth] + _weightsList[depth] = 1.0 } - _weightsList = [] } - MetalKernel.get.upload([_β.w_p!, _Ɣ.w_p!]) + _Ɣ.w.initialize(array: &_weightsList) + _β.w.initialize(array: &_weightsList, start: _nbNeurons) + + _weightsList = [] } /// @@ -1654,7 +1626,7 @@ class InstanceNormalizationGPU: LayerWeightsNormalization if _xHat == nil { - _xHat = MetalPrivateBuffer( + _xHat = FloatBuffer(nbElems: batchSize * _nbNeurons * width * height, deviceID: _deviceID ) @@ -1686,8 +1658,8 @@ class InstanceNormalizationGPU: LayerWeightsNormalization _computeμ(layer) _computeσ2(layer) - let layerFirst = layer._layersPrev.first as! Layer2D - let layerLast = layer._layersPrev.last as! Layer1D + let layerFirst = layer.layersPrev.first as! Layer2D + let layerLast = layer.layersPrev.last as! Layer1D let batchSize = layer.batchSize let width = layer.width let height = layer.height @@ -1698,7 +1670,7 @@ class InstanceNormalizationGPU: LayerWeightsNormalization if _xHat == nil { - _xHat = MetalPrivateBuffer( + _xHat = FloatBuffer(nbElems: batchSize * _nbNeurons * width * height, deviceID: _deviceID ) @@ -1738,7 +1710,7 @@ class InstanceNormalizationGPU: LayerWeightsNormalization if _μ == nil { - _μ = MetalPrivateBuffer( + _μ = FloatBuffer(nbElems: batchSize * _nbNeurons, deviceID: _deviceID ) } @@ -1759,7 +1731,7 @@ class InstanceNormalizationGPU: LayerWeightsNormalization /// Compute the averages of the different independent normalization units. private func _computeμ(_ layer: AdaIN) { - let layerFirst = layer._layersPrev.first as! Layer2D + let layerFirst = layer.layersPrev.first as! Layer2D let nbChannels = layer.nbChannels let batchSize = layer.batchSize let width = layer.width @@ -1771,7 +1743,7 @@ class InstanceNormalizationGPU: LayerWeightsNormalization if _μ == nil { - _μ = MetalPrivateBuffer( + _μ = FloatBuffer(nbElems: batchSize * _nbNeurons, deviceID: _deviceID ) } @@ -1803,7 +1775,7 @@ class InstanceNormalizationGPU: LayerWeightsNormalization if _σ2 == nil { - _σ2 = MetalPrivateBuffer( + _σ2 = FloatBuffer(nbElems: batchSize * _nbNeurons, deviceID: _deviceID ) } @@ -1825,7 +1797,7 @@ class InstanceNormalizationGPU: LayerWeightsNormalization /// Compute the deviations of the different independent normalization units. private func _computeσ2(_ layer: AdaIN) { - let layerFirst = layer._layersPrev.first as! Layer2D + let layerFirst = layer.layersPrev.first as! Layer2D let nbChannels = layer.nbChannels let batchSize = layer.batchSize let width = layer.width @@ -1837,7 +1809,7 @@ class InstanceNormalizationGPU: LayerWeightsNormalization if _σ2 == nil { - _σ2 = MetalPrivateBuffer( + _σ2 = FloatBuffer(nbElems: batchSize * _nbNeurons, deviceID: _deviceID ) } @@ -1894,8 +1866,8 @@ class InstanceNormalizationGPU: LayerWeightsNormalization { _backward(layer) - let layerFirst = layer._layersPrev.first as! Layer2D - let layerLast = layer._layersPrev.last as! Layer1D + let layerFirst = layer.layersPrev.first as! Layer2D + let layerLast = layer.layersPrev.last as! Layer1D let batchSize = layer.batchSize let width = layer.width let height = layer.height @@ -1941,10 +1913,10 @@ class InstanceNormalizationGPU: LayerWeightsNormalization if _sum1 == nil { - _sum1 = MetalPrivateBuffer( + _sum1 = FloatBuffer(nbElems: batchSize * _nbNeurons, deviceID: _deviceID ) - _sum2 = MetalPrivateBuffer( + _sum2 = FloatBuffer(nbElems: batchSize * _nbNeurons, deviceID: _deviceID ) } @@ -1971,7 +1943,7 @@ class InstanceNormalizationGPU: LayerWeightsNormalization /// Compute the gradients of weights in the GPU execution context. private func _backward(_ layer: AdaIN) { - let layerLast = layer._layersPrev.last as! Layer1D + let layerLast = layer.layersPrev.last as! Layer1D let batchSize = layer.batchSize let width = layer.width let height = layer.height @@ -1983,10 +1955,10 @@ class InstanceNormalizationGPU: LayerWeightsNormalization if _sum1 == nil { - _sum1 = MetalPrivateBuffer( + _sum1 = FloatBuffer(nbElems: batchSize * _nbNeurons, deviceID: _deviceID ) - _sum2 = MetalPrivateBuffer( + _sum2 = FloatBuffer(nbElems: batchSize * _nbNeurons, deviceID: _deviceID ) } @@ -2359,40 +2331,40 @@ class LayerNormalizationGPU: LayerWeightsNormalization /// Buffer of weights to scale the normalization result. /// Shape ~ (nbNeurons,). /// - var _Ɣ: IWeightBuffers! = nil + var _Ɣ: WeightBuffers! = nil /// /// Buffer of biases to add to the normalization result. /// Shape ~ (nbNeurons,). /// - var _β: IWeightBuffers! = nil + var _β: WeightBuffers! = nil /// /// Buffer of averages of data for the different independent batch normalization units. /// Shape ~ (batch, sequence). /// - var _μ: MetalBuffer! = nil + var _μ: FloatBuffer! = nil /// /// Buffer of deviations of data for the different independent batch normalization units. /// Shape ~ (batch, sequence). /// - var _σ2: MetalBuffer! = nil + var _σ2: FloatBuffer! = nil /// /// Buffer of data normalized without taking into account the biases and the weights. /// Shape ~ (batch, sequence, nbNeurons). /// - var _xHat: MetalBuffer! = nil + var _xHat: FloatBuffer! = nil /// /// Buffer used to compute backward pass. /// Shape ~ (batch, sequence). /// - var _sum1: MetalBuffer! = nil + var _sum1: FloatBuffer! = nil /// /// Buffer used to compute backward pass. /// Shape ~ (batch, sequence). /// - var _sum2: MetalBuffer! = nil + var _sum2: FloatBuffer! = nil /// GPU device on which model is executed. var _deviceID = 0 @@ -2406,11 +2378,8 @@ class LayerNormalizationGPU: LayerWeightsNormalization return super.weights } - MetalKernel.get.download([_β.w_p!, _Ɣ.w_p!]) - - var weightsTmp = [Float]() - weightsTmp += _Ɣ.w_p!.shared.array - weightsTmp += _β.w_p!.shared.array + var weightsTmp = _Ɣ!.w.download() + weightsTmp += _β!.w.download() return weightsTmp } set { @@ -2468,28 +2437,19 @@ class LayerNormalizationGPU: LayerWeightsNormalization _β = WeightBuffers(nbElems: _nbNeurons, deviceID: _deviceID) _Ɣ = WeightBuffers(nbElems: _nbNeurons, deviceID: _deviceID) - let βPtr = _β.w_p!.shared.buffer - let ƔPtr = _Ɣ.w_p!.shared.buffer - if _weightsList.count == 0 { + _weightsList = [Float](repeating: 0.0, count: 2 * _nbNeurons) for depth in 0..<_nbNeurons { - ƔPtr[depth] = 1.0 - βPtr[depth] = 0.0 + _weightsList[depth] = 1.0 } } - else - { - for depth in 0..<_nbNeurons - { - ƔPtr[depth] = _weightsList[depth] - βPtr[depth] = _weightsList[_nbNeurons + depth] - } - _weightsList = [] - } - MetalKernel.get.upload([_β.w_p!, _Ɣ.w_p!]) + _Ɣ.w.initialize(array: &_weightsList) + _β.w.initialize(array: &_weightsList, start: _nbNeurons) + + _weightsList = [] } /// @@ -2524,14 +2484,17 @@ class LayerNormalizationGPU: LayerWeightsNormalization if _xHat == nil { - _xHat = MetalPrivateBuffer( + _xHat = FloatBuffer(nbElems: batchSize * sequence * _nbNeurons, deviceID: _deviceID ) } + let kernel = _nbNeurons % 4 == 0 ? + "forwardLayerNormSeq4" : "forwardLayerNormSeq" + let coeff = _nbNeurons % 4 == 0 ? 4 : 1 let command = MetalKernel.get.createCommand( - "forwardLayerNormSeq", deviceID: _deviceID + kernel, deviceID: _deviceID ) command.setBuffer(_β.w.metal, atIndex: 0) command.setBuffer(_Ɣ.w.metal, atIndex: 1) @@ -2544,7 +2507,7 @@ class LayerNormalizationGPU: LayerWeightsNormalization command.setBuffer(_xHat.metal, atIndex: 8) command.dispatchThreads( - width: _nbNeurons, + width: _nbNeurons / coeff, height: batchSize * sequence ) command.enqueue() @@ -2562,13 +2525,15 @@ class LayerNormalizationGPU: LayerWeightsNormalization if _μ == nil { - _μ = MetalPrivateBuffer( + _μ = FloatBuffer(nbElems: batchSize * sequence, deviceID: _deviceID ) } + let kernel = _nbNeurons % 4 == 0 ? + "computeLayerNormSeqμ4" : "computeLayerNormSeqμ" let command = MetalKernel.get.createCommand( - "computeLayerNormSeqμ", deviceID: _deviceID + kernel, deviceID: _deviceID ) command.setBuffer(layer.outs.metal, atIndex: 0) command.setBytes(pNbNeurons, atIndex: 1) @@ -2592,13 +2557,15 @@ class LayerNormalizationGPU: LayerWeightsNormalization if _σ2 == nil { - _σ2 = MetalPrivateBuffer( + _σ2 = FloatBuffer(nbElems: batchSize * sequence, deviceID: _deviceID ) } + let kernel = _nbNeurons % 4 == 0 ? + "computeLayerNormSeqσ24" : "computeLayerNormSeqσ2" let command = MetalKernel.get.createCommand( - "computeLayerNormSeqσ2", deviceID: _deviceID + kernel, deviceID: _deviceID ) command.setBuffer(layer.outs.metal, atIndex: 0) command.setBuffer(_μ.metal, atIndex: 1) @@ -2624,8 +2591,11 @@ class LayerNormalizationGPU: LayerWeightsNormalization let pNbBatch: [UInt32] = [UInt32(batchSize)] let pSequence: [UInt32] = [UInt32(sequence)] + let kernel = _nbNeurons % 4 == 0 ? + "backwardLayerNormSeq4" : "backwardLayerNormSeq" + let coeff = _nbNeurons % 4 == 0 ? 4 : 1 let command = MetalKernel.get.createCommand( - "backwardLayerNormSeq", deviceID: _deviceID + kernel, deviceID: _deviceID ) command.setBuffer(_σ2.metal, atIndex: 0) command.setBuffer(_xHat.metal, atIndex: 1) @@ -2638,7 +2608,7 @@ class LayerNormalizationGPU: LayerWeightsNormalization command.setBuffer(layer.delta.metal, atIndex: 8) command.dispatchThreads( - width: _nbNeurons, + width: _nbNeurons / coeff, height: batchSize * sequence ) command.enqueue() @@ -2656,16 +2626,18 @@ class LayerNormalizationGPU: LayerWeightsNormalization if _sum1 == nil { - _sum1 = MetalPrivateBuffer( + _sum1 = FloatBuffer(nbElems: batchSize * sequence, deviceID: _deviceID ) - _sum2 = MetalPrivateBuffer( + _sum2 = FloatBuffer(nbElems: batchSize * sequence, deviceID: _deviceID ) } + let kernel = _nbNeurons % 4 == 0 ? + "backwardWeights1LayerNormSeq4" : "backwardWeights1LayerNormSeq" let command = MetalKernel.get.createCommand( - "backwardWeights1LayerNormSeq", deviceID: _deviceID + kernel, deviceID: _deviceID ) command.setBuffer(layer.delta.metal, atIndex: 0) command.setBuffer(_xHat.metal, atIndex: 1) @@ -2691,8 +2663,11 @@ class LayerNormalizationGPU: LayerWeightsNormalization let pSequence: [UInt32] = [UInt32(sequence)] let pAccumulate: [UInt32] = layer.accumulateDeltaWeights ? [1] : [0] + let kernel = _nbNeurons % 4 == 0 ? + "backwardWeights2LayerNormSeq4" : "backwardWeights2LayerNormSeq" + let coeff = _nbNeurons % 4 == 0 ? 4 : 1 let command = MetalKernel.get.createCommand( - "backwardWeights2LayerNormSeq", deviceID: _deviceID + kernel, deviceID: _deviceID ) command.setBuffer(layer.delta.metal, atIndex: 0) command.setBuffer(_xHat.metal, atIndex: 1) @@ -2703,7 +2678,7 @@ class LayerNormalizationGPU: LayerWeightsNormalization command.setBuffer(_Ɣ.g.metal, atIndex: 6) command.setBuffer(_β.g.metal, atIndex: 7) - command.dispatchThreads(_nbNeurons) + command.dispatchThreads(_nbNeurons / coeff) command.enqueue() } @@ -2713,3 +2688,568 @@ class LayerNormalizationGPU: LayerWeightsNormalization return [_Ɣ, _β] } } + +/// A layer that applies layer normalization in the CPU execution context. +public class RMSNormalization: LayerWeightsNormalization +{ + /// Slight modification to avoid "divide by 0" errors. + let _Ɛ: Double = 1e-5 + + /// + /// Array of weights to scale the normalization result. + /// Shape ~ (nbNeurons,). + /// + var _Ɣ: WeightArrays! = nil + + /// + /// List of deviations of data for the different independent batch normalization units. + /// Shape ~ ((batch x sequence),). + /// + var _σ2 = [Double]() + + /// + /// The list of data normalized without taking into account the biases and the weights. + /// Shape ~ ((batch x sequence), (nbNeurons)). + /// + var _xHat = [[Double]]() + + /// Weights in the CPU execution context. + override var weights: [Float] + { + get { + if _Ɣ == nil + { + return super.weights + } + + var weightsTmp = [Float]() + for Ɣ in _Ɣ.w + { + weightsTmp.append(Float(Ɣ)) + } + return weightsTmp + } + set { + if newValue.count > 0 && newValue.count != _nbNeurons + { + fatalError( + "Weights do not have the expected number of elements." + ) + } + super.weights = newValue + } + } + + /// Copy this. + public override func clone() -> Self + { + return RMSNormalization(norm: self) as! Self + } + + /// + /// Clean state resources in the CPU execution context. + /// + /// We do not clean Ɣ and β but must reset their momentum state. + /// Note that we do not have to reset their delta because here they are independent on + /// batch size. + /// + func resetKernel() + { + _σ2 = [] + _xHat = [] + + _Ɣ?.reset() + } + + /// + /// Initialize weights in the CPU execution context. + /// + /// Their momentum state is also reset. + /// Note that we also initialize the delta which are independent on the batch size. + /// + func initWeights() + { + _Ɣ = WeightArrays(_nbNeurons) + if _weightsList.count == 0 + { + for depth in 0..<_nbNeurons + { + _Ɣ.w[depth] = 1.0 + } + } + else + { + for depth in 0..<_nbNeurons + { + _Ɣ.w[depth] = Double(_weightsList[depth]) + } + _weightsList = [] + } + } + + /// Apply the forward pass of the Gradient Checking in CPU execution context. + func forwardGC(_ layer: RMSNormSeq) + { + let nbGC = layer.nbGC + let nbNeurons = layer.nbNeurons + let Ɛ = layer.Ɛ + + Concurrency.slice(layer.sequence) + { + (seq: Int) in + + for batch in 0..= nbGC-2*nbNeurons + { + let DEPTH = (elem - nbGC + 2 * nbNeurons) / 2 + + if elem % 2 == 0 + { + for depth in 0.. [IWeightArrays] + { + return [_Ɣ] + } +} + +/// A layer that applies layer normalization in the GPU execution context. +class RMSNormalizationGPU: LayerWeightsNormalization +{ + /// + /// Buffer of weights to scale the normalization result. + /// Shape ~ (nbNeurons,). + /// + var _Ɣ: WeightBuffers! = nil + + /// + /// Buffer of deviations of data for the different independent batch normalization units. + /// Shape ~ (batch, sequence). + /// + var _σ2: FloatBuffer! = nil + + /// + /// Buffer of data normalized without taking into account the biases and the weights. + /// Shape ~ (batch, sequence, nbNeurons). + /// + var _xHat: FloatBuffer! = nil + + /// + /// Buffer used to compute backward pass. + /// Shape ~ (batch, sequence). + /// + var _sum2: FloatBuffer! = nil + + /// GPU device on which model is executed. + var _deviceID = 0 + + /// Weights in the GPU execution context. + override var weights: [Float] + { + get { + if _Ɣ == nil + { + return super.weights + } + + return _Ɣ!.w.download() + } + set { + if newValue.count > 0 && newValue.count != _nbNeurons + { + fatalError( + "Weights do not have the expected number of elements." + ) + } + super.weights = newValue + } + } + + /// Copy this. + public override func clone() -> Self + { + return RMSNormalizationGPU(norm: self) as! Self + } + + /// + /// Clean state resources in the GPU execution context. + /// + /// We do not clean Ɣ and β but must reset their momentum state. + /// + func resetKernel() + { + _σ2 = nil + _xHat = nil + _sum2 = nil + + _Ɣ?.reset() + } + + /// + /// Initialize hard resources in the GPU execution context. + /// + /// We initialize the stats. + /// + /// - Parameter deviceID: The id of GPU where to run the model. + /// + func initKernel(deviceID: Int) + { + _deviceID = deviceID + } + + /// + /// Initialize weights in the GPU execution context. + /// + /// Their momentum and delta state are also reset. + /// + func initWeights() + { + _Ɣ = WeightBuffers(nbElems: _nbNeurons, deviceID: _deviceID) + + if _weightsList.count == 0 + { + _weightsList = [Float](repeating: 0.0, count: _nbNeurons) + for depth in 0..<_nbNeurons + { + _weightsList[depth] = 1.0 + } + } + _Ɣ.w.initialize(array: &_weightsList) + + _weightsList = [] + } + + /// + /// Get the weights and biases back to the CPU execution context. + /// + /// This function is necessary for the Gradient Checking in the GPU execution context. + /// + /// - Parameter norm: The layer in the CPU execution context. + /// + func applyWeights(norm: RMSNormalization) + { + let weights = self.weights + for depth in 0..<_nbNeurons + { + norm._Ɣ.w[depth] = Double(weights[depth]) + } + } + + /// Apply the forward pass in the GPU execution context. + func forward(_ layer: RMSNormSeq) + { + _computeσ2(layer) + + let batchSize = layer.batchSize + let sequence = layer.sequence + + let pNbNeurons: [UInt32] = [UInt32(_nbNeurons)] + let pNbBatch: [UInt32] = [UInt32(batchSize)] + let pSequence: [UInt32] = [UInt32(sequence)] + let pAddUnitOffset: [UInt32] = layer.addUnitOffset ? [1] : [0] + + if _xHat == nil + { + _xHat = FloatBuffer(nbElems: + batchSize * sequence * _nbNeurons, + deviceID: _deviceID + ) + } + + let command = MetalKernel.get.createCommand( + "forwardRMSNormSeq", deviceID: _deviceID + ) + command.setBuffer(_Ɣ.w.metal, atIndex: 0) + command.setBuffer(_σ2.metal, atIndex: 1) + command.setBytes(pNbNeurons, atIndex: 2) + command.setBytes(pNbBatch, atIndex: 3) + command.setBytes(pSequence, atIndex: 4) + command.setBytes(pAddUnitOffset, atIndex: 5) + command.setBuffer(layer.outs.metal, atIndex: 6) + command.setBuffer(_xHat.metal, atIndex: 7) + + command.dispatchThreads( + width: _nbNeurons, + height: batchSize * sequence + ) + command.enqueue() + } + + /// Compute the deviations of the different independent normalization units. + private func _computeσ2(_ layer: RMSNormSeq) + { + let batchSize = layer.batchSize + let sequence = layer.sequence + + let pNbNeurons: [UInt32] = [UInt32(_nbNeurons)] + let pNbBatch: [UInt32] = [UInt32(batchSize)] + let pSequence: [UInt32] = [UInt32(sequence)] + + if _σ2 == nil + { + _σ2 = FloatBuffer(nbElems: + batchSize * sequence, deviceID: _deviceID + ) + } + + let command = MetalKernel.get.createCommand( + "computeRMSNormSeqσ2", deviceID: _deviceID + ) + command.setBuffer(layer.outs.metal, atIndex: 0) + command.setBytes(pNbNeurons, atIndex: 1) + command.setBytes(pNbBatch, atIndex: 2) + command.setBytes(pSequence, atIndex: 3) + command.setBuffer(_σ2.metal, atIndex: 4) + + command.dispatchThreads(width: sequence, height: batchSize) + command.enqueue() + } + + /// Apply the backward pass in the GPU execution context. + func backward(_ layer: RMSNormSeq) + { + _backwardWeights1(layer) + _backwardWeights2(layer) + + let batchSize = layer.batchSize + let sequence = layer.sequence + + let pNbNeurons: [UInt32] = [UInt32(_nbNeurons)] + let pNbBatch: [UInt32] = [UInt32(batchSize)] + let pSequence: [UInt32] = [UInt32(sequence)] + let pAddUnitOffset: [UInt32] = layer.addUnitOffset ? [1] : [0] + + let command = MetalKernel.get.createCommand( + "backwardRMSNormSeq", deviceID: _deviceID + ) + command.setBuffer(_σ2.metal, atIndex: 0) + command.setBuffer(_xHat.metal, atIndex: 1) + command.setBuffer(_Ɣ.w.metal, atIndex: 2) + command.setBuffer(_sum2.metal, atIndex: 3) + command.setBytes(pNbNeurons, atIndex: 4) + command.setBytes(pNbBatch, atIndex: 5) + command.setBytes(pSequence, atIndex: 6) + command.setBytes(pAddUnitOffset, atIndex: 7) + command.setBuffer(layer.delta.metal, atIndex: 8) + + command.dispatchThreads( + width: _nbNeurons, + height: batchSize * sequence + ) + command.enqueue() + } + + /// Compute the gradients of weights in the GPU execution context. + private func _backwardWeights1(_ layer: RMSNormSeq) + { + let batchSize = layer.batchSize + let sequence = layer.sequence + + let pNbNeurons: [UInt32] = [UInt32(_nbNeurons)] + let pNbBatch: [UInt32] = [UInt32(batchSize)] + let pSequence: [UInt32] = [UInt32(sequence)] + let pAddUnitOffset: [UInt32] = layer.addUnitOffset ? [1] : [0] + + if _sum2 == nil + { + _sum2 = FloatBuffer(nbElems: + batchSize * sequence, deviceID: _deviceID + ) + } + + let command = MetalKernel.get.createCommand( + "backwardWeights1RMSNormSeq", deviceID: _deviceID + ) + command.setBuffer(layer.delta.metal, atIndex: 0) + command.setBuffer(_xHat.metal, atIndex: 1) + command.setBuffer(_Ɣ.w.metal, atIndex: 2) + command.setBytes(pNbNeurons, atIndex: 3) + command.setBytes(pNbBatch, atIndex: 4) + command.setBytes(pSequence, atIndex: 5) + command.setBytes(pAddUnitOffset, atIndex: 6) + command.setBuffer(_sum2.metal, atIndex: 7) + + command.dispatchThreads(width: sequence, height: batchSize) + command.enqueue() + } + + /// Compute the gradients of weights in the GPU execution context. + private func _backwardWeights2(_ layer: RMSNormSeq) + { + let batchSize = layer.batchSize + let sequence = layer.sequence + + let pNbNeurons: [UInt32] = [UInt32(_nbNeurons)] + let pNbBatch: [UInt32] = [UInt32(batchSize)] + let pSequence: [UInt32] = [UInt32(sequence)] + let pAccumulate: [UInt32] = layer.accumulateDeltaWeights ? [1] : [0] + + let command = MetalKernel.get.createCommand( + "backwardWeights2RMSNormSeq", deviceID: _deviceID + ) + command.setBuffer(layer.delta.metal, atIndex: 0) + command.setBuffer(_xHat.metal, atIndex: 1) + command.setBytes(pNbNeurons, atIndex: 2) + command.setBytes(pNbBatch, atIndex: 3) + command.setBytes(pSequence, atIndex: 4) + command.setBytes(pAccumulate, atIndex: 5) + command.setBuffer(_Ɣ.g.metal, atIndex: 6) + + command.dispatchThreads(_nbNeurons) + command.enqueue() + } + + /// Get the weights in the GPU execution context. + func collectWeights() -> [IWeightBuffers] + { + return [_Ɣ] + } +} diff --git a/Sources/GrAIdient/Core/Layer/LayerUpdate.swift b/Sources/GrAIdient/Core/Layer/LayerUpdate.swift index 6c6c31d3..77afb017 100644 --- a/Sources/GrAIdient/Core/Layer/LayerUpdate.swift +++ b/Sources/GrAIdient/Core/Layer/LayerUpdate.swift @@ -6,6 +6,7 @@ // import Foundation +import Accelerate /// Error occuring in an output layer. public enum LossError: Error @@ -29,7 +30,7 @@ extension LossError: CustomStringConvertible /// Running phase of a model. public enum Phase { - case Training, Inference + case Training, InferenceBackward, Inference } /// API for a layer that have learning weights. @@ -73,15 +74,15 @@ public protocol IWeightBuffers var nbElems: Int { get } /// Weights buffer: the buffer to be update. - var w: MetalBuffer { get } + var w: FloatBuffer { get } /// Gradients buffer. - var g: MetalBuffer { get } + var g: FloatBuffer { get } /// Momentum buffer. - var m: MetalBuffer { get } + var m: FloatBuffer { get } /// Velocity buffer. - var v: MetalBuffer { get } + var v: FloatBuffer { get } /// Velocity normalized buffer. - var vHat: MetalBuffer { get } + var vHat: FloatBuffer { get } /// Clean the momentum..., preserving the weights. func reset() @@ -89,50 +90,35 @@ public protocol IWeightBuffers extension IWeightBuffers { - /// Get the weights as a private buffer. - var w_p: MetalPrivateBuffer? - { - get { - return w as? MetalPrivateBuffer - } - } - /// Get the weights as a shared buffer. - var w_s: MetalSharedBuffer? - { - get { - return w as? MetalSharedBuffer - } - } - - /// Get the gradient buffer as a private buffer. - var g_p: MetalPrivateBuffer? + /// GPU device where the buffers are sent. + public var deviceID: Int { get { - return g as? MetalPrivateBuffer + return w.deviceID } } - /// Get the gradient buffer as a shared buffer. - var g_s: MetalSharedBuffer? + /// Number of elements in the different buffers. + public var nbElems: Int { get { - return g as? MetalSharedBuffer + return w.nbElems } } } /// GPU buffers needed to update the weights. -class WeightBuffers: IWeightBuffers +public class WeightBuffers: IWeightBuffers { - /// Number of elements in the different buffers. - let nbElems: Int - /// GPU device where the buffers are sent. - let deviceID: Int - - var _w: MetalBuffer! = nil - var _g: MetalBuffer! = nil - var _m: MetalBuffer! = nil - var _v: MetalBuffer! = nil - var _vHat: MetalBuffer! = nil + /// Weights buffer: the buffer to be update. + public let w: FloatBuffer + /// Gradients buffer. + public let g: FloatBuffer + /// Momentum buffer. + public let m: FloatBuffer + /// Velocity buffer. + public let v: FloatBuffer + /// Velocity normalized buffer. + public let vHat: FloatBuffer /// /// Create a container of buffers. @@ -143,78 +129,25 @@ class WeightBuffers: IWeightBuffers /// init(nbElems: Int, deviceID: Int) { - self.nbElems = nbElems - self.deviceID = deviceID - } - - /// Weights buffer: the buffer to be update. - var w: MetalBuffer - { - get { - if _w == nil - { - _w = MetalPrivateBuffer(nbElems, deviceID: deviceID) - } - return _w - } - } - - /// Gradients buffer. - var g: MetalBuffer - { - get { - if _g == nil - { - _g = MetalPrivateBuffer(nbElems, deviceID: deviceID) - } - return _g - } - } - - /// Momentum buffer. - var m: MetalBuffer - { - get { - if _m == nil - { - _m = MetalPrivateBuffer(nbElems, deviceID: deviceID) - } - return _m - } - } - - /// Velocity buffer. - var v: MetalBuffer - { - get { - if _v == nil - { - _v = MetalPrivateBuffer(nbElems, deviceID: deviceID) - } - return _v - } + w = FloatBuffer(nbElems: nbElems, deviceID: deviceID) + g = FloatBuffer(nbElems: nbElems, deviceID: deviceID) + m = FloatBuffer(nbElems: nbElems, deviceID: deviceID) + v = FloatBuffer( + nbElems: nbElems, deviceID: deviceID, forceFloat: true + ) + vHat = FloatBuffer( + nbElems: nbElems, deviceID: deviceID, forceFloat: true + ) } - /// Velocity normalized buffer. - var vHat: MetalBuffer + /// Clean the buffers. + public func reset() { - get { - if _vHat == nil - { - _vHat = MetalPrivateBuffer(nbElems, deviceID: deviceID) - } - return _vHat - } - } - - /// Clean the momentum..., preserving the weights. - func reset() - { - // do not touch _w - _g = nil - _m = nil - _v = nil - _vHat = nil + // do not touch w + g.reset() + m.reset() + v.reset() + vHat.reset() } } @@ -256,7 +189,11 @@ extension LayerWeightInit } } + /// /// Generate list of weights values. + /// + /// - Returns: The generated list of values. + /// public func generateWeightsList() -> [Float] { let nbElems = weightListSize @@ -288,6 +225,52 @@ extension LayerWeightInit return weightsList } + /// + /// Generate weights values. + /// + /// - Parameters: + /// - out: The output buffer. + /// - deviceID: GPU device. + /// + public func generateWeightsList( + out: FloatBuffer, + deviceID: Int) + { + let nbElems = weightListSize + switch weightInitClass { + case .XavierUniform: + Self.XavierUniform( + nbElems: nbElems, + connectivityIO: connectivityIO, + out: out, + deviceID: deviceID + ) + case .XavierNormal: + Self.XavierNormal( + nbElems: nbElems, + connectivityIO: connectivityIO, + out: out, + deviceID: deviceID + ) + case .KaimingUniform: + Self.KaimingUniform( + nbElems: nbElems, + coeff: coeffInitWeights, + connectivityIO: connectivityIO, + out: out, + deviceID: deviceID + ) + case .KaimingNormal: + Self.KaimingNormal( + nbElems: nbElems, + coeff: coeffInitWeights, + connectivityIO: connectivityIO, + out: out, + deviceID: deviceID + ) + } + } + /// /// Xavier uniform initialization method. /// @@ -309,6 +292,50 @@ extension LayerWeightInit return values } + /// + /// Xavier uniform initialization method. + /// + /// - Parameters: + /// - nbElems: Number of weights to initialize. + /// - connectivityIO: Number of input and output connections. + /// - out: The output buffer. + /// - deviceID: GPU device. + /// + static func XavierUniform( + nbElems: Int, + connectivityIO: (Int, Int), + out: FloatBuffer, + deviceID: Int) + { + var array = [Float](repeating: 0.0, count: nbElems) + array.withUnsafeMutableBufferPointer + { + ptr in + + let bound = + sqrt(6) / sqrt(Float(connectivityIO.0 + connectivityIO.1)) + guard var arrayDescriptor = BNNSNDArrayDescriptor( + data: ptr, + shape: .vector(nbElems)), + let randomNumberGenerator = BNNSCreateRandomGenerator( + BNNSRandomGeneratorMethodAES_CTR, + nil) else + { + fatalError() + } + + BNNSRandomFillUniformFloat( + randomNumberGenerator, + &arrayDescriptor, + -bound, + bound + ) + + BNNSDestroyRandomGenerator(randomNumberGenerator) + } + out.initialize(array: &array) + } + /// /// Xavier normal initialization method. /// @@ -330,11 +357,55 @@ extension LayerWeightInit return values } + /// + /// Xavier normal initialization method. + /// + /// - Parameters: + /// - nbElems: Number of weights to initialize. + /// - connectivityIO: Number of input and output connections. + /// - out: The output buffer. + /// - deviceID: GPU device. + /// + static func XavierNormal( + nbElems: Int, + connectivityIO: (Int, Int), + out: FloatBuffer, + deviceID: Int) + { + var array = [Float](repeating: 0.0, count: nbElems) + array.withUnsafeMutableBufferPointer + { + ptr in + + let std = sqrt(2) / sqrt(Float(connectivityIO.0 + connectivityIO.1)) + guard var arrayDescriptor = BNNSNDArrayDescriptor( + data: ptr, + shape: .vector(nbElems)), + let randomNumberGenerator = BNNSCreateRandomGenerator( + BNNSRandomGeneratorMethodAES_CTR, + nil) else + { + fatalError() + } + + BNNSRandomFillNormalFloat( + randomNumberGenerator, + &arrayDescriptor, + 0.0, + std + ) + + BNNSDestroyRandomGenerator(randomNumberGenerator) + } + out.initialize(array: &array) + } + /// /// Kaiming uniform initialization method. /// /// - Parameters: /// - nbElems: Number of weights to initialize. + /// - coeff: Multiplicative coefficient. /// - connectivityIO: Number of input and output connections. /// - Returns: Weights values. /// @@ -352,11 +423,57 @@ extension LayerWeightInit return values } + /// + /// Kaiming uniform initialization method. + /// + /// - Parameters: + /// - nbElems: Number of weights to initialize. + /// - coeff: Multiplicative coefficient. + /// - connectivityIO: Number of input and output connections. + /// - out: The output buffer. + /// - deviceID: GPU device. + /// + static func KaimingUniform( + nbElems: Int, + coeff: Float, + connectivityIO: (Int, Int), + out: FloatBuffer, + deviceID: Int) + { + var array = [Float](repeating: 0.0, count: nbElems) + array.withUnsafeMutableBufferPointer + { + ptr in + + let bound = sqrt(3) * coeff / sqrt(Float(connectivityIO.0)) + guard var arrayDescriptor = BNNSNDArrayDescriptor( + data: ptr, + shape: .vector(nbElems)), + let randomNumberGenerator = BNNSCreateRandomGenerator( + BNNSRandomGeneratorMethodAES_CTR, + nil) else + { + fatalError() + } + + BNNSRandomFillUniformFloat( + randomNumberGenerator, + &arrayDescriptor, + -bound, + bound + ) + + BNNSDestroyRandomGenerator(randomNumberGenerator) + } + out.initialize(array: &array) + } + /// /// Xavier normal initialization method. /// /// - Parameters: /// - nbElems: Number of weights to initialize. + /// - coeff: Multiplicative coefficient. /// - connectivityIO: Number of input and output connections. /// - Returns: Weights values. /// @@ -373,6 +490,51 @@ extension LayerWeightInit } return values } + + /// + /// Kaiming normal initialization method. + /// + /// - Parameters: + /// - nbElems: Number of weights to initialize. + /// - coeff: Multiplicative coefficient. + /// - connectivityIO: Number of input and output connections. + /// - out: The output buffer. + /// - deviceID: GPU device. + /// + static func KaimingNormal( + nbElems: Int, + coeff: Float, + connectivityIO: (Int, Int), + out: FloatBuffer, + deviceID: Int) + { + var array = [Float](repeating: 0.0, count: nbElems) + array.withUnsafeMutableBufferPointer + { + ptr in + + let std = coeff / sqrt(Float(connectivityIO.0)) + guard var arrayDescriptor = BNNSNDArrayDescriptor( + data: ptr, + shape: .vector(nbElems)), + let randomNumberGenerator = BNNSCreateRandomGenerator( + BNNSRandomGeneratorMethodAES_CTR, + nil) else + { + fatalError() + } + + BNNSRandomFillNormalFloat( + randomNumberGenerator, + &arrayDescriptor, + 0.0, + std + ) + + BNNSDestroyRandomGenerator(randomNumberGenerator) + } + out.initialize(array: &array) + } } /// diff --git a/Sources/GrAIdient/Core/Model/Model.swift b/Sources/GrAIdient/Core/Model/Model.swift index 0e603ac2..8e75510a 100644 --- a/Sources/GrAIdient/Core/Model/Model.swift +++ b/Sources/GrAIdient/Core/Model/Model.swift @@ -186,6 +186,43 @@ public class BaseModel: Codable newModel.layers = newLayers return newModel } + + /// + /// Update sequence of the model, creating a new one. + /// + /// - Parameters: + /// - mapping: Dictionary allowing to find the layer associated to some id. + /// This dictionary is particularly useful when the different layers cannot access + /// their `layerPrev`. + /// - inPlace: Whether hard resources should be copied as is. + /// - sequence: Length of the sequence. + /// + /// - Returns: A new model. When `inPlace` is false, `initKernel` is + /// necessary in order to recreate hard resources. + /// + func updateSeq( + mapping: inout Dictionary, + inPlace: Bool, + sequence: Int) -> BaseModel + { + let newModel = BaseModel(name: name) + var newLayers = [Layer]() + + for layer in layers + { + let newLayer = layer.copy(mapping: mapping, inPlace: inPlace) + newLayers.append(newLayer) + mapping[layer.id] = newLayer + + if let layerTmp = newLayer as? LayerSeq + { + layerTmp.sequence = sequence + } + } + + newModel.layers = newLayers + return newModel + } } /// @@ -606,7 +643,8 @@ public class Model: BaseModel public func initKernel(phase: Phase? = nil, deviceID: Int = 0) { self.phase = phase - if phase != nil && phase! == .Inference + if phase != nil && + (phase! == .Inference || phase! == .InferenceBackward) { self.computeDeltaWeights = false } @@ -682,6 +720,45 @@ public class Model: BaseModel } } + /// + /// Initialize state resources. + /// + /// We initialize the neurons' forward's state. + /// + public func initForward(batchSize: Int) throws + { + if GrAI.Opti.GPU + { + for layer in layers + { + try layer.checkStateForwardGPU(batchSize: batchSize) + } + } + else + { + for layer in layers + { + try layer.checkStateCPU(batchSize: batchSize) + } + } + } + + /// + /// Initialize state resources. + /// + /// We initialize the neurons' backward's state. + /// + public func initBackward(batchSize: Int) throws + { + if GrAI.Opti.GPU + { + for layer in layers + { + try layer.checkStateBackwardGPU(batchSize: batchSize) + } + } + } + /// /// Initialize hard resources and set the parameters for the optimizer. /// @@ -780,6 +857,39 @@ public class Model: BaseModel return newModels } + /// + /// Return a list of models, updating the sequence. + /// + /// - Parameters: + /// - models: The different models to resize. + /// - sequence: Length of the sequence. + /// - inPlace: Whether hard resources should be copied as is. + /// + /// - Returns: The list of created models. When `inPlace` is false, `initKernel` is + /// necessary in order to recreate hard resources. + /// + public static func updateSeq( + models: [BaseModel], + sequence: Int, + inPlace: Bool) -> [Model] + { + var mapping = Dictionary() + + var newModels = [Model]() + for model in models + { + let newBaseModel = model.updateSeq( + mapping: &mapping, + inPlace: inPlace, + sequence: sequence + ) + let newModel = Model(model: newBaseModel, modelsPrev: newModels) + newModels.append(newModel) + } + + return newModels + } + /// Notify optimizer that a step has been completed. public func incStep() { @@ -909,7 +1019,7 @@ public class Model: BaseModel if GrAI.Opti.GPU { let gNorm: Float? = gradientNorm != nil ? - Float(gradientNorm!) : nil + Float(gradientNorm!) : nil try _kernel.algo.udpateGPU(layers: myLayers, gradientNorm: gNorm) } diff --git a/Sources/GrAIdient/Core/Optimizer/OptimizerAlgorithm.swift b/Sources/GrAIdient/Core/Optimizer/OptimizerAlgorithm.swift index 31f11259..e85cf693 100644 --- a/Sources/GrAIdient/Core/Optimizer/OptimizerAlgorithm.swift +++ b/Sources/GrAIdient/Core/Optimizer/OptimizerAlgorithm.swift @@ -170,7 +170,7 @@ public class OptimizerAlgorithm try clipGradientGPU( layers: layers, gradientNorm: gNorm, - normThreshold: _optimizer.params.normThreshold + normThreshold: Float(_optimizer.params.normThreshold) ) } @@ -233,7 +233,7 @@ public class OptimizerAlgorithm let nbElems = buffers.g.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] - let pFactor: [Float] = [Float(factor)] + let pFactor: [Float] = [factor] let command = MetalKernel.get.createCommand( "multiplyGradients", deviceID: layer.deviceID @@ -303,22 +303,7 @@ public class OptimizerAlgorithm for buffers in layerUpdate.collectWeightsGPU() { - let buffer: UnsafeMutableBufferPointer - if let g_p = buffers.g_p - { - MetalKernel.get.download([g_p]) - buffer = g_p.shared.buffer - } - else if let g_s = buffers.g_s - { - MetalKernel.get.download([g_s]) - buffer = g_s.buffer - } - else - { - fatalError("Unreachable.") - } - + let buffer = buffers.g.download() for i in 0.. - if let g_p = buffers.g_p - { - MetalKernel.get.download([g_p]) - buffer = g_p.shared.buffer - } - else if let g_s = buffers.g_s - { - MetalKernel.get.download([g_s]) - buffer = g_s.buffer - } - else - { - fatalError("Unreachable.") - } - + let buffer = buffers.g.download() for i in 0.. Float(normThreshold) { + if gradientNorm > normThreshold { for layer in layers { if let layerUpdate = layer as? LayerUpdate, @@ -486,8 +456,8 @@ public class OptimizerAlgorithm let nbElems = buffers.g.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] - let pGradientNorm: [Float] = [Float(gradientNorm)] - let pNormThreshold: [Float] = [Float(normThreshold)] + let pGradientNorm: [Float] = [gradientNorm] + let pNormThreshold: [Float] = [normThreshold] let command = MetalKernel.get.createCommand( "clipGradients", deviceID: layer.deviceID diff --git a/Sources/GrAIdient/Core/Optimizer/OptimizerImpl.swift b/Sources/GrAIdient/Core/Optimizer/OptimizerImpl.swift index 1a9899d9..5e237d3c 100644 --- a/Sources/GrAIdient/Core/Optimizer/OptimizerImpl.swift +++ b/Sources/GrAIdient/Core/Optimizer/OptimizerImpl.swift @@ -294,12 +294,12 @@ class AdamOptimizer: OptimizerImpl override func stepGPU(_ weights: IWeightBuffers) { let nbElems = weights.nbElems - let t = Double(_kernel.params.t) + let t = Float(_kernel.params.t) let pNbElems: [UInt32] = [UInt32(nbElems)] let pAlpha: [Float] = [Float(alpha)] let pLambda: [Float] = [lambda != nil ? Float(lambda!) : 0.0] - let pT: [Float] = [Float(t)] + let pT: [Float] = [t] let command = MetalKernel.get.createCommand( "weightsAdam", deviceID: weights.deviceID @@ -366,12 +366,12 @@ class AMSGradOptimizer: OptimizerImpl override func stepGPU(_ weights: IWeightBuffers) { let nbElems = weights.nbElems - let t = Double(_kernel.params.t) + let t = Float(_kernel.params.t) let pNbElems: [UInt32] = [UInt32(nbElems)] let pAlpha: [Float] = [Float(alpha)] let pLambda: [Float] = [lambda != nil ? Float(lambda!) : 0.0] - let pT: [Float] = [Float(t)] + let pT: [Float] = [t] let command = MetalKernel.get.createCommand( "weightsAMSGrad", deviceID: weights.deviceID @@ -449,12 +449,12 @@ class AdamRectifiedOptimizer: OptimizerImpl override func stepGPU(_ weights: IWeightBuffers) { let nbElems = weights.nbElems - let t = Double(_kernel.params.t) + let t = Float(_kernel.params.t) let pNbElems: [UInt32] = [UInt32(nbElems)] let pAlpha: [Float] = [Float(alpha)] let pLambda: [Float] = [lambda != nil ? Float(lambda!) : 0.0] - let pT: [Float] = [Float(t)] + let pT: [Float] = [t] let command = MetalKernel.get.createCommand( "weightsAdamRectified", deviceID: weights.deviceID @@ -583,12 +583,12 @@ class AdaBoundOptimizer: BoundOptimizer override func stepGPU(_ weights: IWeightBuffers) { let nbElems = weights.nbElems - let t = Double(_kernel.params.t) + let t = Float(_kernel.params.t) let pNbElems: [UInt32] = [UInt32(nbElems)] let pAlpha: [Float] = [Float(alpha)] let pLambda: [Float] = [lambda != nil ? Float(lambda!) : 0.0] - let pT: [Float] = [Float(t)] + let pT: [Float] = [t] let pLowerBound: [Float] = [Float(lowerBound!)] let pUpperBound: [Float] = [Float(upperBound!)] @@ -667,12 +667,12 @@ class AMSBoundOptimizer: BoundOptimizer override func stepGPU(_ weights: IWeightBuffers) { let nbElems = weights.nbElems - let t = Double(_kernel.params.t) + let t = Float(_kernel.params.t) let pNbElems: [UInt32] = [UInt32(nbElems)] let pAlpha: [Float] = [Float(alpha)] let pLambda: [Float] = [lambda != nil ? Float(lambda!) : 0.0] - let pT: [Float] = [Float(t)] + let pT: [Float] = [t] let pLowerBound: [Float] = [Float(lowerBound!)] let pUpperBound: [Float] = [Float(upperBound!)] diff --git a/Sources/GrAIdient/Core/State/Weights.swift b/Sources/GrAIdient/Core/State/Weights.swift index 03e2b610..a45053dc 100644 --- a/Sources/GrAIdient/Core/State/Weights.swift +++ b/Sources/GrAIdient/Core/State/Weights.swift @@ -27,10 +27,10 @@ public protocol IWeightArrays } /// Arrays needed to update the weights. -class WeightArrays: IWeightArrays +public class WeightArrays: IWeightArrays { /// Number of elements in the different arrays. - let nbElems: Int + public let nbElems: Int var _w: [Double] = [] var _g: [Double] = [] @@ -49,7 +49,7 @@ class WeightArrays: IWeightArrays } /// Weights array: the array to update. - var w: [Double] + public var w: [Double] { get { if _w.count == 0 @@ -69,7 +69,7 @@ class WeightArrays: IWeightArrays } } /// Gradients array. - var g: [Double] + public var g: [Double] { get { if _g.count == 0 @@ -89,7 +89,7 @@ class WeightArrays: IWeightArrays } } /// Momentum array. - var m: [Double] + public var m: [Double] { get { if _m.count == 0 @@ -109,7 +109,7 @@ class WeightArrays: IWeightArrays } } /// Velocity array. - var v: [Double] + public var v: [Double] { get { if _v.count == 0 @@ -129,7 +129,7 @@ class WeightArrays: IWeightArrays } } /// Veclocity normalized array. - var vHat: [Double] + public var vHat: [Double] { get { if _vHat.count == 0 @@ -150,7 +150,7 @@ class WeightArrays: IWeightArrays } /// Clean the momentum..., preserving the weights. - func reset() + public func reset() { _g = [] _m = [] diff --git a/Sources/GrAIdient/GrAI.swift b/Sources/GrAIdient/GrAI.swift index 16db39a7..06f3ff31 100644 --- a/Sources/GrAIdient/GrAI.swift +++ b/Sources/GrAIdient/GrAI.swift @@ -70,6 +70,68 @@ public class GrAI } } + /// Namespace for precision settings. + public class Precision + { + /// Get/Set double precision. + public static var double: Bool + { + get { + return getCtx.precision == PrecisionType.Double + } + set { + if newValue && GrAI.Opti.CPU + { + getCtx.precision = PrecisionType.Double + } + else if newValue + { + fatalError( + "Cannot set double precision with GPU optimization." + ) + } + } + } + /// Get/Set float precision. + public static var float: Bool + { + get { + return getCtx.precision == PrecisionType.Float + } + set { + if newValue && GrAI.Opti.GPU + { + getCtx.precision = PrecisionType.Float + } + else if newValue + { + fatalError( + "Cannot set float precision with CPU optimization." + ) + } + } + } + /// Get/Set float16 precision. + public static var float16: Bool + { + get { + return getCtx.precision == PrecisionType.Float16 + } + set { + if newValue && GrAI.Opti.GPU + { + getCtx.precision = PrecisionType.Float16 + } + else if newValue + { + fatalError( + "Cannot set float precision with CPU optimization." + ) + } + } + } + } + /// Namespace for gradient settings. public class Gradient { @@ -346,6 +408,14 @@ public class GrAI } } +/// Precision mode. +public enum PrecisionType +{ + case Double + case Float + case Float16 +} + /// A global context with stored variables. fileprivate class GrAIContext { @@ -370,8 +440,15 @@ fileprivate class GrAIContext case GPU } + /// Used to select GPU device. var gpuNamedPriority = [String]() + //-------------------------------------------------------------------------- + // PRECISION + //-------------------------------------------------------------------------- + /// Precision type. + var precision = PrecisionType.Float + //-------------------------------------------------------------------------- // GRADIENT //-------------------------------------------------------------------------- diff --git a/Sources/GrAIdient/Layer1D/Activation1D.swift b/Sources/GrAIdient/Layer1D/Activation1D.swift index c4e8c590..6ba5d9c8 100644 --- a/Sources/GrAIdient/Layer1D/Activation1D.swift +++ b/Sources/GrAIdient/Layer1D/Activation1D.swift @@ -16,7 +16,7 @@ public class Activation1D: Layer1D /// used in the GPU execution context. /// Shape ~ (batch, nbNeurons). /// - var _tmp: MetalPrivateBuffer! = nil + var tmp: FloatBuffer! = nil /// Get coefficient (depending on activation function) to apply during the weights initialization. public var coeffInitWeights: Float @@ -156,7 +156,7 @@ public class Activation1D: Layer1D public override func resetKernelGPU() { super.resetKernelGPU() - _tmp = nil + tmp = nil } /// @@ -250,14 +250,16 @@ public class Activation1D: Layer1D let nbElems = outs.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] + let kernel = nbElems % 4 == 0 ? "sum14" : "sum1" + let coeff = nbElems % 4 == 0 ? 4 : 1 let command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID + kernel, deviceID: deviceID ) command.setBuffer(layerPrev.outs.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(outs.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() _activation!.forwardGPU(self) @@ -308,24 +310,25 @@ public class Activation1D: Layer1D let nbElems = delta.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] - let command: MetalCommand + let kernel: String + let coeff = nbElems % 4 == 0 ? 4 : 1 if layerPrev.dirty { - command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum14" : "sum1" } else { - command = MetalKernel.get.createCommand( - "sum2", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum24" : "sum2" } + let command = MetalKernel.get.createCommand( + kernel, deviceID: deviceID + ) + command.setBuffer(delta.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(layerPrev.delta.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() propagateDirty() diff --git a/Sources/GrAIdient/Layer1D/BCE1D.swift b/Sources/GrAIdient/Layer1D/BCE1D.swift index da842382..8e3bdedc 100644 --- a/Sources/GrAIdient/Layer1D/BCE1D.swift +++ b/Sources/GrAIdient/Layer1D/BCE1D.swift @@ -207,7 +207,7 @@ public class BCE1D: LayerOutput1D /// - Returns: The loss value. /// public func getLossGPU( - _ groundTruth: MetalBuffer, + _ groundTruth: FloatBuffer, batchSize: Int, nbNeurons: Int) throws -> Float { @@ -233,9 +233,8 @@ public class BCE1D: LayerOutput1D command.dispatchThreads(batchSize) command.enqueue() - MetalKernel.get.download([loss]) var loss: Float = 0.0 - let lossPtr = self.loss.buffer + let lossPtr = self.loss.download() for i in 0.., + _ groundTruth: FloatBuffer, batchSize: Int, nbNeurons: Int) throws { diff --git a/Sources/GrAIdient/Layer1D/BCESigmoid1D.swift b/Sources/GrAIdient/Layer1D/BCESigmoid1D.swift index 237d3da3..79ff2e9d 100644 --- a/Sources/GrAIdient/Layer1D/BCESigmoid1D.swift +++ b/Sources/GrAIdient/Layer1D/BCESigmoid1D.swift @@ -230,7 +230,7 @@ public class BCESigmoid1D: LayerOutput1D /// - Returns: The loss value. /// public func getLossGPU( - _ groundTruth: MetalBuffer, + _ groundTruth: FloatBuffer, batchSize: Int, nbNeurons: Int) throws -> Float { @@ -256,9 +256,8 @@ public class BCESigmoid1D: LayerOutput1D command.dispatchThreads(batchSize) command.enqueue() - MetalKernel.get.download([loss]) var loss: Float = 0.0 - let lossPtr = self.loss.buffer + let lossPtr = self.loss.download() for i in 0.., + _ groundTruth: FloatBuffer, batchSize: Int, nbNeurons: Int) throws { diff --git a/Sources/GrAIdient/Layer1D/Base/Layer1D.swift b/Sources/GrAIdient/Layer1D/Base/Layer1D.swift index 4dcbffcb..ce2ab089 100644 --- a/Sources/GrAIdient/Layer1D/Base/Layer1D.swift +++ b/Sources/GrAIdient/Layer1D/Base/Layer1D.swift @@ -15,12 +15,12 @@ open class Layer1D: Layer /// Output buffer (result of the forward pass) used in the GPU execution context. /// Shape ~ (batch, nbNeurons). /// - public var outs: MetalPrivateBuffer! = nil + public var outs: FloatBuffer! = nil /// /// Gradient buffer (result of the backward pass) used in the GPU execution context. /// Shape ~ (batch, nbNeurons). /// - public var delta: MetalPrivateBuffer! = nil + public var delta: FloatBuffer! = nil /// Number of neurons. public let nbNeurons: Int @@ -113,7 +113,7 @@ open class Layer1D: Layer /// /// We initialize the neurons' state (forward and backward). /// - public func checkStateCPU(batchSize: Int) throws + public override func checkStateCPU(batchSize: Int) throws { if neurons.nbElems == 0 { @@ -134,12 +134,12 @@ open class Layer1D: Layer /// /// We initialize the neurons' forward state. /// - public func checkStateForwardGPU(batchSize: Int) throws + public override func checkStateForwardGPU(batchSize: Int) throws { if outs == nil { - outs = MetalPrivateBuffer( - batchSize * nbNeurons, deviceID: deviceID + outs = FloatBuffer( + nbElems: batchSize * nbNeurons, deviceID: deviceID ) } else if batchSize <= 0 || batchSize > outs.nbElems / nbNeurons @@ -153,17 +153,20 @@ open class Layer1D: Layer /// /// We initialize the neurons' backward state. /// - public func checkStateBackwardGPU(batchSize: Int) throws + public override func checkStateBackwardGPU(batchSize: Int) throws { - if delta == nil + if computeDelta { - delta = MetalPrivateBuffer( - batchSize * nbNeurons, deviceID: deviceID - ) - } - else if batchSize <= 0 || batchSize > delta.nbElems / nbNeurons - { - throw LayerError.BatchSize + if delta == nil + { + delta = FloatBuffer( + nbElems: batchSize * nbNeurons, deviceID: deviceID + ) + } + else if batchSize <= 0 || batchSize > delta.nbElems / nbNeurons + { + throw LayerError.BatchSize + } } } @@ -191,9 +194,8 @@ open class Layer1D: Layer public func getOutsGPU(elem: Int) -> [T] { var outs = [T]() - MetalKernel.get.download([self.outs]) + let outsPtr = self.outs.download() - let outsPtr = self.outs.shared.buffer for depth in 0.., + _ data: FloatBuffer, batchSize: Int, nbNeurons: Int) throws { diff --git a/Sources/GrAIdient/Layer1D/Base/LayerMerge1D.swift b/Sources/GrAIdient/Layer1D/Base/LayerMerge1D.swift index cc557d4e..fa1e4e1c 100644 --- a/Sources/GrAIdient/Layer1D/Base/LayerMerge1D.swift +++ b/Sources/GrAIdient/Layer1D/Base/LayerMerge1D.swift @@ -9,15 +9,15 @@ public class LayerMerge1D: Layer1D { /// List of links to the previous layers in the model. - var _layersPrev = [Layer]() + public var layersPrev = [Layer]() /// List of identifiers of the previous layers in the model. - let _idsPrev: [Int] + public let idsPrev: [Int] /// Whether backward pass should continue backward or not. public override var mustComputeBackward: Bool { get { - for layerPrev in _layersPrev + for layerPrev in layersPrev { if layerPrev.computeDelta { @@ -50,7 +50,7 @@ public class LayerMerge1D: Layer1D { idsPrev.append(layer.id) } - _idsPrev = idsPrev + self.idsPrev = idsPrev super.init(layerPrev: layersPrev[0], nbNeurons: nbNeurons, @@ -68,7 +68,7 @@ public class LayerMerge1D: Layer1D public required init(from decoder: Decoder) throws { let container = try decoder.container(keyedBy: Keys.self) - _idsPrev = try container.decode([Int].self, forKey: .idsPrev) + idsPrev = try container.decode([Int].self, forKey: .idsPrev) try super.init(from: decoder) } @@ -86,7 +86,7 @@ public class LayerMerge1D: Layer1D public override func encode(to encoder: Encoder) throws { var container = encoder.container(keyedBy: Keys.self) - try container.encode(_idsPrev, forKey: .idsPrev) + try container.encode(idsPrev, forKey: .idsPrev) try super.encode(to: encoder) } @@ -97,14 +97,14 @@ public class LayerMerge1D: Layer1D /// public override func initLinks(_ layers: [Layer]) { - _layersPrev = [Layer]() - for id in _idsPrev + layersPrev = [Layer]() + for id in idsPrev { for testLayer in layers { if testLayer.id == id { - _layersPrev.append(testLayer) + layersPrev.append(testLayer) break } } @@ -118,9 +118,9 @@ public class LayerMerge1D: Layer1D /// public override func propagateDirty(_ dirty: Bool = false) { - for num in 0..<_layersPrev.count + for num in 0.. ([Layer], [Int]) { var layersBranches = [Layer?]() - for layer in _layersPrev + for layer in layersPrev { layersBranches.append(layer) } @@ -234,7 +234,7 @@ public class LayerMerge1D: Layer1D var nbElems = [Int]() var nbLastElems = [Int](repeating: nbSameElems, - count: _layersPrev.count) + count: layersPrev.count) for (index, layer) in zip(layersIndex, layersMerged) { let nbElemsTmp = layer.nbGC diff --git a/Sources/GrAIdient/Layer1D/Base/LayerOutput1D.swift b/Sources/GrAIdient/Layer1D/Base/LayerOutput1D.swift index 22200116..2479d066 100644 --- a/Sources/GrAIdient/Layer1D/Base/LayerOutput1D.swift +++ b/Sources/GrAIdient/Layer1D/Base/LayerOutput1D.swift @@ -15,13 +15,13 @@ open class LayerOutput1D: Layer1D /// Ground truth buffer in the GPU execution context. /// Shape ~ (batch, nbNeurons). /// - public internal(set) var groundTruth: MetalSharedBuffer! = nil + public internal(set) var groundTruth: FloatBuffer! = nil /// /// Loss buffer in the GPU execution context. /// Shape ~ (batch,). /// - public internal(set) var loss: MetalSharedBuffer! = nil + public internal(set) var loss: FloatBuffer! = nil private enum Keys: String, CodingKey { @@ -147,9 +147,10 @@ open class LayerOutput1D: Layer1D if self.groundTruth == nil { - self.groundTruth = MetalSharedBuffer( - batchSize * nbNeurons, - deviceID: deviceID + self.groundTruth = FloatBuffer( + nbElems: batchSize * nbNeurons, + deviceID: deviceID, + shared: true ) } else if batchSize <= 0 || @@ -158,7 +159,7 @@ open class LayerOutput1D: Layer1D throw LayerError.BatchSize } - let bufferPtr = self.groundTruth.buffer + var buffer = [Float](repeating: 0.0, count: batchSize * nbNeurons) for (i, dataI) in groundTruth.enumerated() { if dataI.count != nbNeurons @@ -167,10 +168,10 @@ open class LayerOutput1D: Layer1D } for (j, dataIJ) in dataI.enumerated() { - bufferPtr[j + i * nbNeurons] = Float(dataIJ) + buffer[j + i * nbNeurons] = Float(dataIJ) } } - MetalKernel.get.upload([self.groundTruth]) + self.groundTruth.initialize(array: &buffer) } /// @@ -184,7 +185,7 @@ open class LayerOutput1D: Layer1D /// - nbNeurons: Number of neurons. /// public func checkGroundTruthGPU( - _ groundTruth: MetalBuffer, + _ groundTruth: FloatBuffer, batchSize: Int, nbNeurons: Int) throws { @@ -211,7 +212,9 @@ open class LayerOutput1D: Layer1D { if loss == nil { - loss = MetalSharedBuffer(batchSize, deviceID: deviceID) + loss = FloatBuffer( + nbElems: batchSize, deviceID: deviceID, shared: true + ) } else if batchSize > loss.nbElems { @@ -291,14 +294,16 @@ open class LayerOutput1D: Layer1D let nbElems = outs.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] + let kernel = nbElems % 4 == 0 ? "sum14" : "sum1" + let coeff = nbElems % 4 == 0 ? 4 : 1 let command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID + kernel, deviceID: deviceID ) command.setBuffer(layerPrev.outs.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(outs.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() } } @@ -346,24 +351,25 @@ open class LayerOutput1D: Layer1D let nbElems = delta.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] - let command: MetalCommand + let kernel: String + let coeff = nbElems % 4 == 0 ? 4 : 1 if layerPrev.dirty { - command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum14" : "sum1" } else { - command = MetalKernel.get.createCommand( - "sum2", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum24" : "sum2" } + let command = MetalKernel.get.createCommand( + kernel, deviceID: deviceID + ) + command.setBuffer(delta.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(layerPrev.delta.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() propagateDirty() diff --git a/Sources/GrAIdient/Layer1D/Concat1D.swift b/Sources/GrAIdient/Layer1D/Concat1D.swift index f163a8d5..bac58a5e 100644 --- a/Sources/GrAIdient/Layer1D/Concat1D.swift +++ b/Sources/GrAIdient/Layer1D/Concat1D.swift @@ -53,7 +53,7 @@ public class Concat1D: LayerMerge1D params.context.curID = id var layersPrev = [Layer1D]() - for idPrev in _idsPrev + for idPrev in idsPrev { layersPrev.append(mapping[idPrev] as! Layer1D) } @@ -87,9 +87,9 @@ public class Concat1D: LayerMerge1D for batch in 0..! = nil + var _wDeltaWeights: FloatBuffer! = nil /// Whether to compute weights' gradients or not. public var computeDeltaWeights: Bool = true @@ -64,12 +64,7 @@ public class Constant1D: Layer1D, LayerUpdate { return _weightsList } - - var weightsTmp = [Float]() - MetalKernel.get.download([_wBuffers.w_p!]) - weightsTmp += _wBuffers.w_p!.shared.array - - return weightsTmp + return _wBuffers.w.download() } set { _weightsList = newValue @@ -258,24 +253,16 @@ public class Constant1D: Layer1D, LayerUpdate deviceID: deviceID ) - let weightsPtr = _wBuffers.w_p!.shared.buffer - if _weightsList.count == 0 + if _weightsList.count != 0 { - for depth in 0..( + _wDeltaWeights = FloatBuffer(nbElems: batchSize * nbNeurons, deviceID: deviceID ) } @@ -353,8 +340,7 @@ public class Constant1D: Layer1D, LayerUpdate neurons.get(depth)!.initGC(batchSize: batchSize, nbGC: newGC) } - MetalKernel.get.download([_wBuffers.w_p!]) - let weightsPtr = _wBuffers.w_p!.shared.buffer + let weightsPtr = _wBuffers.w.download() for batch in 0..! = nil + + private enum Keys: String, CodingKey + { + case coeff + } + + /// + /// Create a layer with a 1D shape neural structure. + /// + /// - Parameters: + /// - layerPrev: Previous layer that has been queued to the model. + /// - coeff: Probability for each neuron to be zeroed. + /// - params: Contextual parameters linking to the model. + /// + public init(layerPrev: Layer1D, + coeff: Double, + params: GrAI.Model.Params) + { + self.coeff = coeff + super.init(layerPrev: layerPrev, + nbNeurons: layerPrev.nbNeurons, + params: params) + } + + /// + /// Decode from the disk. + /// + /// Throw an error if reading from the decoder fails, or + /// if the data read is corrupted or otherwise invalid. + /// + /// - Parameter decoder: The decoder to read data from. + /// + public required init(from decoder: Decoder) throws + { + let values = try decoder.container(keyedBy: Keys.self) + coeff = try values.decode(Double.self, forKey: Keys.coeff) + try super.init(from: decoder) + } + + /// + /// Encode to the disk. + /// + /// If the value fails to encode anything, `encoder` will encode an empty + /// keyed container in its place. + /// + /// Throw an error if any values are invalid for the given + /// encoder's format. + /// + /// - Parameter encoder: The encoder to write data to. + /// + public override func encode(to encoder: Encoder) throws + { + var container = encoder.container(keyedBy: Keys.self) + try container.encode(coeff, forKey: Keys.coeff) + try super.encode(to: encoder) + } + + /// + /// Create a layer with same values as this. + /// + /// - Parameters: + /// - mapping: Dictionary allowing to find the layer associated to some id. + /// This dictionary is particularly useful when the different layers cannot access + /// their `layerPrev`. + /// - inPlace: Whether hard resources should be copied as is. + /// + /// - Returns: A new layer. When `inPlace` is false, `initKernel` is + /// necessary in order to recreate hard resources. + /// + public override func copy( + mapping: Dictionary, + inPlace: Bool) -> Layer + { + let context = ModelContext(name: "", curID: 0) + let layerPrev = mapping[idPrev] as! Layer1D + + let params = GrAI.Model.Params(context: context) + params.context.curID = id + + let layer = Dropout1D( + layerPrev: layerPrev, + coeff: coeff, + params: params + ) + return layer + } + + /// + /// Clean state resources in the GPU execution context. + /// + /// We clean the neurons' state (forward and backward). + /// + public override func resetKernelGPU() + { + super.resetKernelGPU() + _dropout = nil + } + + /// + /// Initialize state resources in the CPU execution context. + /// + /// We initialize the neurons' state (forward and backward). + /// + public override func checkStateCPU(batchSize: Int) throws + { + try super.checkStateCPU(batchSize: batchSize) + + if _dropout == nil + { + _dropout = MetalSharedBuffer( + batchSize * nbNeurons, + deviceID: deviceID + ) + } + } + + /// + /// Initialize state resources in the GPU execution context. + /// + /// We initialize the neurons' forward state. + /// + public override func checkStateForwardGPU(batchSize: Int) throws + { + try super.checkStateForwardGPU(batchSize: batchSize) + + if _dropout == nil + { + _dropout = MetalSharedBuffer( + batchSize * nbNeurons, + deviceID: deviceID + ) + } + } + + /// + /// Apply the forward pass of the Gradient Checking in CPU execution context. + /// + /// Throw an error if batch size is greater than the first batch size. + /// + public override func forwardGCCPU() throws + { + if let layerPrev = self.layerPrev as? Layer1D + { + try checkStateCPU(batchSize: batchSize) + + let applyDropout = phase != nil && phase == .Training + let dropoutPtr = _dropout.buffer + + let nbGC = layerPrev.nbGC + for j in 0..! = nil + var _wDeltaWeights: FloatBuffer! = nil /// /// Buffer of gradients per sample for biases. /// Shape ~ (batch, nbNeurons). /// - var _bDeltaWeights: MetalPrivateBuffer! = nil + var _bDeltaWeights: FloatBuffer! = nil /// Whether to compute weights' gradients or not. public var computeDeltaWeights: Bool = true @@ -105,7 +105,7 @@ public class FullyConnected: Activation1D, LayerWithActivation, LayerWeightInit } /// Output buffer of previous layer. - var outsPrev: MetalPrivateBuffer + var outsPrev: FloatBuffer { get { if let layerPrev = self.layerPrev as? Layer1D @@ -124,7 +124,7 @@ public class FullyConnected: Activation1D, LayerWithActivation, LayerWeightInit } /// Gradient buffer of previous layer. - var deltaPrev: MetalPrivateBuffer? + var deltaPrev: FloatBuffer? { get { if let layerPrev = self.layerPrev as? Layer1D @@ -199,14 +199,10 @@ public class FullyConnected: Activation1D, LayerWithActivation, LayerWeightInit return _weightsList } - var weightsTmp = [Float]() - MetalKernel.get.download([_wBuffers.w_p!]) - weightsTmp += _wBuffers.w_p!.shared.array - + var weightsTmp = _wBuffers.w.download() if _updateBiases { - MetalKernel.get.download([_bBuffers.w_p!]) - weightsTmp += _bBuffers.w_p!.shared.array + weightsTmp += _bBuffers.w.download() } return weightsTmp } @@ -567,12 +563,6 @@ public class FullyConnected: Activation1D, LayerWithActivation, LayerWeightInit /// public func initWeightsGPU() { - if _weightsList.count == 0 - { - _weightsList = generateWeightsList() - _weightsList += [Float](repeating: 0.0, count: weightHeight) - } - _wBuffers = WeightBuffers( nbElems: weightHeight * weightWidth, deviceID: deviceID @@ -582,34 +572,24 @@ public class FullyConnected: Activation1D, LayerWithActivation, LayerWeightInit deviceID: deviceID ) - let weightsPtr = _wBuffers.w_p!.shared.buffer - let biasesPtr = _bBuffers.w_p!.shared.buffer - - for elem in 0..( + _wDeltaWeights = FloatBuffer(nbElems: batchSize * nbNeurons * weightWidth, deviceID: deviceID ) if _updateBiases { - _bDeltaWeights = MetalPrivateBuffer( + _bDeltaWeights = FloatBuffer(nbElems: batchSize * nbNeurons, deviceID: deviceID ) } @@ -776,11 +756,8 @@ public class FullyConnected: Activation1D, LayerWithActivation, LayerWeightInit neurons.get(depth)!.initGC(batchSize: batchSize, nbGC: newGC) } - MetalKernel.get.download([_wBuffers.w_p!, _bBuffers.w_p!]) - MetalKernel.get.download([outsPrev]) - - let weightsPtr = _wBuffers.w_p!.shared.buffer - let biasesPtr = _bBuffers.w_p!.shared.buffer + let weightsPtr = _wBuffers.w.download() + let biasesPtr = _bBuffers.w.download() let neuronsPrev = self.neuronsPrev for batch in 0.. [IWeightArrays] { - var weights = [IWeightArrays]() + var weights = [WeightArrays]() weights.append(_wArrays) if _updateBiases { @@ -1253,8 +1230,7 @@ public class FullyConnected: Activation1D, LayerWithActivation, LayerWeightInit } var deltaWeights = [T]() - MetalKernel.get.download([_wDeltaWeights]) - var deltaWeightsPtr = _wDeltaWeights.shared.buffer + var deltaWeightsPtr = _wDeltaWeights.download() let offsetStart = elem * nbNeurons * weightWidth for depth in 0.., IWeightArrays /// GPU buffers needed to update the inputs of a layer. class InputBuffers1D: InputBuffers, IWeightBuffers -{ +{ /// Inputs buffer: the buffer to be update. - var w: MetalBuffer + var w: FloatBuffer { get { return _layer.outs @@ -71,7 +71,7 @@ class InputBuffers1D: InputBuffers, IWeightBuffers } /// Gradients buffer. - var g: MetalBuffer + var g: FloatBuffer { get { return _layer.delta @@ -304,7 +304,7 @@ public class Input1D: LayerInput1D, LayerUpdate /// - nbNeurons: Number of neurons. /// public func setDataGPU( - _ data: MetalPrivateBuffer, + _ data: FloatBuffer, batchSize: Int, nbNeurons: Int) throws { @@ -348,14 +348,16 @@ public class Input1D: LayerInput1D, LayerUpdate let nbElems = outs.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] + let kernel = nbElems % 4 == 0 ? "sum14" : "sum1" + let coeff = nbElems % 4 == 0 ? 4 : 1 let command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID + kernel, deviceID: deviceID ) command.setBuffer(layerPrev.outs.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(outs.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() } } @@ -399,24 +401,25 @@ public class Input1D: LayerInput1D, LayerUpdate let nbElems = delta.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] - let command: MetalCommand + let kernel: String + let coeff = nbElems % 4 == 0 ? 4 : 1 if layerPrev.dirty { - command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum14" : "sum1" } else { - command = MetalKernel.get.createCommand( - "sum2", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum24" : "sum2" } + let command = MetalKernel.get.createCommand( + kernel, deviceID: deviceID + ) + command.setBuffer(delta.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(layerPrev.delta.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() propagateDirty() diff --git a/Sources/GrAIdient/Layer1D/LinearError1D.swift b/Sources/GrAIdient/Layer1D/LinearError1D.swift index 6549eeea..3ce12e28 100644 --- a/Sources/GrAIdient/Layer1D/LinearError1D.swift +++ b/Sources/GrAIdient/Layer1D/LinearError1D.swift @@ -201,7 +201,7 @@ public class LinearError1D: LayerOutput1D /// - Returns: The loss value. /// public func getLossGPU( - _ groundTruth: MetalBuffer, + _ groundTruth: FloatBuffer, batchSize: Int) throws -> Float { try checkLossGPU(batchSize: batchSize) @@ -225,9 +225,8 @@ public class LinearError1D: LayerOutput1D command.dispatchThreads(batchSize) command.enqueue() - MetalKernel.get.download([loss]) var loss: Float = 0.0 - let lossPtr = self.loss.buffer + let lossPtr = self.loss.download() for i in 0.., + _ groundTruth: FloatBuffer, batchSize: Int, nbNeurons: Int) throws -> Float { @@ -229,9 +229,8 @@ public class MSE1D: LayerOutput1D command.dispatchThreads(batchSize) command.enqueue() - MetalKernel.get.download([loss]) var loss: Float = 0.0 - let lossPtr = self.loss.buffer + let lossPtr = self.loss.download() for i in 0.., + _ groundTruth: FloatBuffer, batchSize: Int, nbNeurons: Int) throws { diff --git a/Sources/GrAIdient/Layer1D/Sum1D.swift b/Sources/GrAIdient/Layer1D/Sum1D.swift index e2daedf2..88894aa0 100644 --- a/Sources/GrAIdient/Layer1D/Sum1D.swift +++ b/Sources/GrAIdient/Layer1D/Sum1D.swift @@ -70,7 +70,7 @@ public class Sum1D: LayerMerge1D params.context.curID = id var layersPrev = [Layer1D]() - for idPrev in _idsPrev + for idPrev in idsPrev { layersPrev.append(mapping[idPrev] as! Layer1D) } @@ -106,9 +106,9 @@ public class Sum1D: LayerMerge1D for depth in 0..! = nil + var tmp: FloatBuffer! = nil /// Get coefficient (depending on activation function) to apply during the weights initialization. public var coeffInitWeights: Float @@ -163,7 +163,7 @@ public class Activation2D: Layer2D public override func resetKernelGPU() { super.resetKernelGPU() - _tmp = nil + tmp = nil } /// @@ -261,14 +261,16 @@ public class Activation2D: Layer2D let nbElems = outs.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] + let kernel = nbElems % 4 == 0 ? "sum14" : "sum1" + let coeff = nbElems % 4 == 0 ? 4 : 1 let command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID + kernel, deviceID: deviceID ) command.setBuffer(layerPrev.outs.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(outs.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() _activation!.forwardGPU(self) @@ -321,25 +323,25 @@ public class Activation2D: Layer2D let nbElems = delta.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] - let command: MetalCommand + let kernel: String + let coeff = nbElems % 4 == 0 ? 4 : 1 if layerPrev.dirty { - command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum14" : "sum1" } else { - command = MetalKernel.get.createCommand( - "sum2", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum24" : "sum2" } + let command = MetalKernel.get.createCommand( + kernel, deviceID: deviceID + ) command.setBuffer(delta.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(layerPrev.delta.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() propagateDirty() diff --git a/Sources/GrAIdient/Layer2D/AdaIN.swift b/Sources/GrAIdient/Layer2D/AdaIN.swift index 2fd50d6c..12b2f6cd 100644 --- a/Sources/GrAIdient/Layer2D/AdaIN.swift +++ b/Sources/GrAIdient/Layer2D/AdaIN.swift @@ -34,7 +34,7 @@ public class AdaIN: LayerMerge2D var computeDeltaMain: Bool { get { - let layerFirst = _layersPrev.first as! Layer2D + let layerFirst = layersPrev.first as! Layer2D return layerFirst.computeDelta } } @@ -42,7 +42,7 @@ public class AdaIN: LayerMerge2D var computeDeltaStyle: Bool { get { - let layerLast = _layersPrev.last as! Layer1D + let layerLast = layersPrev.last as! Layer1D return layerLast.computeDelta } } @@ -143,7 +143,7 @@ public class AdaIN: LayerMerge2D params.context.curID = id var layersPrev = [Layer]() - for idPrev in _idsPrev + for idPrev in idsPrev { layersPrev.append(mapping[idPrev]!) } @@ -293,7 +293,7 @@ public class AdaIN: LayerMerge2D for batch in 0.. [Double] { - let layerFirst = _layersPrev.first as! Layer2D + let layerFirst = layersPrev.first as! Layer2D var outs = [Double](repeating: 0.0, count: height * width) for i in 0.. Double { - let layerLast = _layersPrev.last as! Layer1D + let layerLast = layersPrev.last as! Layer1D return layerLast.neurons.get(depth)!.gc[batch][elem].out } @@ -607,7 +606,7 @@ public class AdaIN: LayerMerge2D /// func getOutsPrev(depth: Int, batch: Int) -> [Double] { - let layerFirst = _layersPrev.first as! Layer2D + let layerFirst = layersPrev.first as! Layer2D var outs = [Double](repeating: 0.0, count: height * width) for i in 0.. Double { - let layerLast = _layersPrev.last as! Layer1D + let layerLast = layersPrev.last as! Layer1D return layerLast.neurons.get(depth)!.v[batch].out } @@ -663,7 +662,7 @@ public class AdaIN: LayerMerge2D /// - Returns: The outputs. /// func getOutsPrev( - buffer: UnsafeMutableBufferPointer, + buffer: [Float], depth: Int, batch: Int) -> [Double] { @@ -692,11 +691,11 @@ public class AdaIN: LayerMerge2D /// - Returns: The output. /// func getOutStyle( - buffer: UnsafeMutableBufferPointer, + buffer: [Float], depth: Int, batch: Int) -> Double { - let layerLast = _layersPrev.last as! Layer1D + let layerLast = layersPrev.last as! Layer1D let offset = depth + layerLast.nbNeurons * batch return Double(buffer[offset]) } @@ -737,7 +736,7 @@ public class AdaIN: LayerMerge2D return } - let layerFirst = _layersPrev.first as! Layer2D + let layerFirst = layersPrev.first as! Layer2D for i in 0.., + _ groundTruth: FloatBuffer, batchSize: Int, nbChannels: Int, height: Int, width: Int) throws -> Float { @@ -300,9 +300,8 @@ public class BCE2D: LayerOutput2D command.dispatchThreads(batchSize) command.enqueue() - MetalKernel.get.download([loss]) var loss: Float = 0.0 - let lossPtr = self.loss.buffer + let lossPtr = self.loss.download() for i in 0.., + _ groundTruth: FloatBuffer, batchSize: Int, nbChannels: Int, height: Int, width: Int) throws { diff --git a/Sources/GrAIdient/Layer2D/BCESigmoid2D.swift b/Sources/GrAIdient/Layer2D/BCESigmoid2D.swift index d1104542..6c5396c0 100644 --- a/Sources/GrAIdient/Layer2D/BCESigmoid2D.swift +++ b/Sources/GrAIdient/Layer2D/BCESigmoid2D.swift @@ -315,7 +315,7 @@ public class BCESigmoid2D: LayerOutput2D /// - Returns: The loss value. /// public func getLossGPU( - _ groundTruth: MetalBuffer, + _ groundTruth: FloatBuffer, batchSize: Int, nbChannels: Int, height: Int, width: Int) throws -> Float { @@ -343,9 +343,8 @@ public class BCESigmoid2D: LayerOutput2D command.dispatchThreads(batchSize) command.enqueue() - MetalKernel.get.download([loss]) var loss: Float = 0.0 - let lossPtr = self.loss.buffer + let lossPtr = self.loss.download() for i in 0.., + _ groundTruth: FloatBuffer, batchSize: Int, nbChannels: Int, height: Int, width: Int) throws { diff --git a/Sources/GrAIdient/Layer2D/BN2D.swift b/Sources/GrAIdient/Layer2D/BN2D.swift index 17254239..5847ccb7 100644 --- a/Sources/GrAIdient/Layer2D/BN2D.swift +++ b/Sources/GrAIdient/Layer2D/BN2D.swift @@ -533,8 +533,7 @@ public class BN2D: Activation2D, LayerUpdate, LayerWithActivation }}} }} - MetalKernel.get.download([layerPrev.outs]) - let outsPrevPtr = layerPrev.outs.shared.buffer + let outsPrevPtr = layerPrev.outs.download() // Prepare GC for norm weights: Ɣ and β. for batch in 0.. [IWeightArrays] { - var weights = [IWeightArrays]() + var weights = [WeightArrays]() if let norm = self.norm { weights += norm.collectWeights() diff --git a/Sources/GrAIdient/Layer2D/Base/Layer2D.swift b/Sources/GrAIdient/Layer2D/Base/Layer2D.swift index 573ae357..3fe4fe55 100644 --- a/Sources/GrAIdient/Layer2D/Base/Layer2D.swift +++ b/Sources/GrAIdient/Layer2D/Base/Layer2D.swift @@ -5,6 +5,31 @@ // Created by Jean-François Reboud on 09/10/2022. // +/// A layer that needs image size information. +public protocol LayerResize: Layer +{ + /// + /// Resize this layer. + /// + /// - Parameters: + /// - imageWidth: New size width. + /// - imageHeight: New size height. + /// - mapping: Dictionary allowing to find the layer associated to some id. + /// This dictionary is particularly useful when the different layers cannot access + /// their `layerPrev`. + /// - inPlace: Whether hard resources should be copied as is. + /// + /// - Returns: A new layer. When `inPlace` is false, `initKernel` is + /// necessary in order to recreate hard resources. + /// + func resize( + imageWidth: Int, + imageHeight: Int, + mapping: Dictionary, + inPlace: Bool + ) -> Layer +} + /// Layer with a 2D shape neural structure. open class Layer2D: Layer { @@ -15,12 +40,12 @@ open class Layer2D: Layer /// Output buffer (result of the forward pass) used in the GPU execution context. /// Shape ~ (batch, nbChannels, height, width). /// - public var outs: MetalPrivateBuffer! = nil + public var outs: FloatBuffer! = nil /// /// Gradient buffer (result of the backward pass) used in the GPU execution context. /// Shape ~ (batch, nbChannels, height, width). /// - public var delta: MetalPrivateBuffer! = nil + public var delta: FloatBuffer! = nil /// Number of channels. public let nbChannels: Int @@ -162,7 +187,7 @@ open class Layer2D: Layer /// /// We initialize the neurons' state (forward and backward). /// - public func checkStateCPU(batchSize: Int) throws + public override func checkStateCPU(batchSize: Int) throws { if neurons.count == 0 { @@ -188,12 +213,13 @@ open class Layer2D: Layer /// /// We initialize the neurons' forward state. /// - public func checkStateForwardGPU(batchSize: Int) throws + public override func checkStateForwardGPU(batchSize: Int) throws { if outs == nil { - outs = MetalPrivateBuffer( - batchSize * nbChannels * width * height, deviceID: deviceID + outs = FloatBuffer( + nbElems: batchSize * nbChannels * width * height, + deviceID: deviceID ) } else if batchSize <= 0 || @@ -208,18 +234,22 @@ open class Layer2D: Layer /// /// We initialize the neurons' backward state. /// - public func checkStateBackwardGPU(batchSize: Int) throws + public override func checkStateBackwardGPU(batchSize: Int) throws { - if delta == nil + if computeDelta { - delta = MetalPrivateBuffer( - batchSize * nbChannels * width * height, deviceID: deviceID - ) - } - else if batchSize <= 0 || - batchSize > delta.nbElems / (nbChannels * width * height) - { - throw LayerError.BatchSize + if delta == nil + { + delta = FloatBuffer( + nbElems: batchSize * nbChannels * width * height, + deviceID: deviceID + ) + } + else if batchSize <= 0 || + batchSize > delta.nbElems / (nbChannels * width * height) + { + throw LayerError.BatchSize + } } } @@ -248,9 +278,8 @@ open class Layer2D: Layer public func getOutsGPU(elem: Int) -> [T] { var outs = [T]() - MetalKernel.get.download([self.outs]) + let outsPtr = self.outs.download() - let outsPtr = self.outs.shared.buffer for depth in 0.., + _ data: FloatBuffer, batchSize: Int, nbChannels: Int, height: Int, width: Int) throws { diff --git a/Sources/GrAIdient/Layer2D/Base/LayerMerge2D.swift b/Sources/GrAIdient/Layer2D/Base/LayerMerge2D.swift index 8078609c..70759271 100644 --- a/Sources/GrAIdient/Layer2D/Base/LayerMerge2D.swift +++ b/Sources/GrAIdient/Layer2D/Base/LayerMerge2D.swift @@ -9,15 +9,15 @@ open class LayerMerge2D: Layer2D { /// List of links to the previous layers in the model. - var _layersPrev = [Layer]() + public var layersPrev = [Layer]() /// List of identifiers of the previous layers in the model. - let _idsPrev: [Int] + public let idsPrev: [Int] /// Whether backward pass should continue backward or not. public override var mustComputeBackward: Bool { get { - for layerPrev in _layersPrev + for layerPrev in layersPrev { if layerPrev.computeDelta { @@ -37,7 +37,7 @@ open class LayerMerge2D: Layer2D } var valueFirst: Double! = nil - for layerPrev in _layersPrev + for layerPrev in layersPrev { if let layerPrevTmp = layerPrev as? Layer2D { @@ -66,7 +66,7 @@ open class LayerMerge2D: Layer2D } var valueMax: Int! = nil - for layerPrev in _layersPrev + for layerPrev in layersPrev { if let layerPrevTmp = layerPrev as? Layer2D { @@ -106,7 +106,7 @@ open class LayerMerge2D: Layer2D { idsPrev.append(layer.id) } - _idsPrev = idsPrev + self.idsPrev = idsPrev super.init(layerPrev: layersPrev[0], nbChannels: nbChannels, @@ -126,7 +126,7 @@ open class LayerMerge2D: Layer2D public required init(from decoder: Decoder) throws { let container = try decoder.container(keyedBy: Keys.self) - _idsPrev = try container.decode([Int].self, forKey: .idsPrev) + idsPrev = try container.decode([Int].self, forKey: .idsPrev) try super.init(from: decoder) } @@ -144,7 +144,7 @@ open class LayerMerge2D: Layer2D public override func encode(to encoder: Encoder) throws { var container = encoder.container(keyedBy: Keys.self) - try container.encode(_idsPrev, forKey: .idsPrev) + try container.encode(idsPrev, forKey: .idsPrev) try super.encode(to: encoder) } @@ -155,14 +155,14 @@ open class LayerMerge2D: Layer2D /// public override func initLinks(_ layers: [Layer]) { - _layersPrev = [Layer]() - for id in _idsPrev + layersPrev = [Layer]() + for id in idsPrev { for testLayer in layers { if testLayer.id == id { - _layersPrev.append(testLayer) + layersPrev.append(testLayer) break } } @@ -176,9 +176,9 @@ open class LayerMerge2D: Layer2D /// public override func propagateDirty(_ dirty: Bool = false) { - for num in 0..<_layersPrev.count + for num in 0.. ([Layer], [Int]) { var layersBranches = [Layer?]() - for layer in _layersPrev + for layer in layersPrev { layersBranches.append(layer) } @@ -292,7 +292,7 @@ open class LayerMerge2D: Layer2D var nbElems = [Int]() var nbLastElems = [Int](repeating: nbSameElems, - count: _layersPrev.count) + count: layersPrev.count) for (index, layer) in zip(layersIndex, layersMerged) { let nbElemsTmp = layer.nbGC diff --git a/Sources/GrAIdient/Layer2D/Base/LayerOutput2D.swift b/Sources/GrAIdient/Layer2D/Base/LayerOutput2D.swift index 3e1cf343..fcd11e8e 100644 --- a/Sources/GrAIdient/Layer2D/Base/LayerOutput2D.swift +++ b/Sources/GrAIdient/Layer2D/Base/LayerOutput2D.swift @@ -15,13 +15,13 @@ open class LayerOutput2D: Layer2D /// Ground truth buffer in the GPU execution context. /// Shape ~ (batch, nbChannels, height, width). /// - public internal(set) var groundTruth: MetalSharedBuffer! = nil + public internal(set) var groundTruth: FloatBuffer! = nil /// /// Loss buffer in the GPU execution context. /// Shape ~ (batch,). /// - public internal(set) var loss: MetalSharedBuffer! = nil + public internal(set) var loss: FloatBuffer! = nil private enum Keys: String, CodingKey { @@ -157,9 +157,10 @@ open class LayerOutput2D: Layer2D if self.groundTruth == nil { - self.groundTruth = MetalSharedBuffer( - batchSize * nbChannels * height * width, - deviceID: deviceID + self.groundTruth = FloatBuffer( + nbElems: batchSize * nbChannels * height * width, + deviceID: deviceID, + shared: true ) } else if batchSize <= 0 || @@ -168,7 +169,10 @@ open class LayerOutput2D: Layer2D throw LayerError.BatchSize } - let bufferPtr = self.groundTruth.buffer + var buffer = [Float]( + repeating: 0.0, count: batchSize * nbChannels * height * width + ) + switch format { case .RGB: @@ -184,7 +188,7 @@ open class LayerOutput2D: Layer2D let offsetSet = j + (offsetStart + i) * width let gt = groundTruth[nbChannels * offsetGet + depth] - bufferPtr[offsetSet] = Float(gt) + buffer[offsetSet] = Float(gt) }} }} case .Neuron: @@ -199,11 +203,11 @@ open class LayerOutput2D: Layer2D let offset = j + (offsetStart + i) * width let gt = groundTruth[offset] - bufferPtr[offset] = Float(gt) + buffer[offset] = Float(gt) }} }} } - MetalKernel.get.upload([self.groundTruth]) + self.groundTruth.initialize(array: &buffer) } /// @@ -219,7 +223,7 @@ open class LayerOutput2D: Layer2D /// - width: Width of each channel. /// public func checkGroundTruthGPU( - _ groundTruth: MetalBuffer, + _ groundTruth: FloatBuffer, batchSize: Int, nbChannels: Int, height: Int, width: Int) throws { @@ -248,7 +252,9 @@ open class LayerOutput2D: Layer2D { if loss == nil { - loss = MetalSharedBuffer(batchSize, deviceID: deviceID) + loss = FloatBuffer( + nbElems: batchSize, deviceID: deviceID, shared: true + ) } else if batchSize <= 0 || batchSize > loss.nbElems { @@ -344,14 +350,16 @@ open class LayerOutput2D: Layer2D let nbElems = outs.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] + let kernel = nbElems % 4 == 0 ? "sum14" : "sum1" + let coeff = nbElems % 4 == 0 ? 4 : 1 let command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID + kernel, deviceID: deviceID ) command.setBuffer(layerPrev.outs.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(outs.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() } } @@ -403,24 +411,25 @@ open class LayerOutput2D: Layer2D let nbElems = delta.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] - let command: MetalCommand + let kernel: String + let coeff = nbElems % 4 == 0 ? 4 : 1 if layerPrev.dirty { - command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum14" : "sum1" } else { - command = MetalKernel.get.createCommand( - "sum2", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum24" : "sum2" } + let command = MetalKernel.get.createCommand( + kernel, deviceID: deviceID + ) + command.setBuffer(delta.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(layerPrev.delta.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() propagateDirty() diff --git a/Sources/GrAIdient/Layer2D/Concat2D.swift b/Sources/GrAIdient/Layer2D/Concat2D.swift index 4a9a0e6c..0667c5bb 100644 --- a/Sources/GrAIdient/Layer2D/Concat2D.swift +++ b/Sources/GrAIdient/Layer2D/Concat2D.swift @@ -63,7 +63,7 @@ public class Concat2D: LayerMerge2D params.context.curID = id var layersPrev = [Layer2D]() - for idPrev in _idsPrev + for idPrev in idsPrev { layersPrev.append(mapping[idPrev] as! Layer2D) } @@ -104,9 +104,9 @@ public class Concat2D: LayerMerge2D for batch in 0..! = nil + var _wDeltaWeights: FloatBuffer! = nil /// Whether to compute weights' gradients or not. public var computeDeltaWeights: Bool = true @@ -64,12 +64,7 @@ public class Constant2D: Layer2D, LayerResize, LayerUpdate { return _weightsList } - - var weightsTmp = [Float]() - MetalKernel.get.download([_wBuffers.w_p!]) - weightsTmp += _wBuffers.w_p!.shared.array - - return weightsTmp + return _wBuffers.w.download() } set { _weightsList = newValue @@ -210,6 +205,7 @@ public class Constant2D: Layer2D, LayerResize, LayerUpdate /// - mapping: Dictionary allowing to find the layer associated to some id. /// This dictionary is particularly useful when the different layers cannot access /// their `layerPrev`. + /// - inPlace: Whether hard resources should be copied as is. /// /// - Returns: A new instance of `Layer`. When `inPlace` is false, `initKernel` is /// necessary in order to recreate hard resources. @@ -315,24 +311,16 @@ public class Constant2D: Layer2D, LayerResize, LayerUpdate deviceID: deviceID ) - let weightsPtr = _wBuffers.w_p!.shared.buffer - if _weightsList.count == 0 + if _weightsList.count != 0 { - for depth in 0..( + _wDeltaWeights = FloatBuffer(nbElems: batchSize * nbChannels, deviceID: deviceID ) } @@ -416,8 +404,7 @@ public class Constant2D: Layer2D, LayerResize, LayerUpdate neurons[depth].get(i, j)!.initGC(batchSize: batchSize, nbGC: newGC) }}} - MetalKernel.get.download([_wBuffers.w_p!]) - let weightsPtr = _wBuffers.w_p!.shared.buffer + let weightsPtr = _wBuffers.w.download() for batch in 0..! = nil + var _wDeltaWeights: FloatBuffer! = nil /// /// Buffer of gradients per sample for biases. /// Shape ~ (batch, nbChannels). /// - var _bDeltaWeights: MetalPrivateBuffer! = nil + var _bDeltaWeights: FloatBuffer! = nil /// Number of weight kernels. public let nbWeights: Int @@ -184,14 +184,10 @@ public class Convolution2D: BN2D, LayerWeightInit return _weightsList } - var weightsTmp = [Float]() - MetalKernel.get.download([_wBuffers.w_p!]) - weightsTmp += _wBuffers.w_p!.shared.array - + var weightsTmp = _wBuffers.w.download() if _updateBiases { - MetalKernel.get.download([_bBuffers.w_p!]) - weightsTmp += _bBuffers.w_p!.shared.array + weightsTmp += _bBuffers.w.download() } return weightsTmp } @@ -771,12 +767,6 @@ public class Convolution2D: BN2D, LayerWeightInit /// public override func initWeightsGPU() { - if _weightsList.count == 0 - { - _weightsList = generateWeightsList() - _weightsList += [Float](repeating: 0.0, count: nbChannels) - } - super.initWeightsGPU() _wBuffers = WeightBuffers( @@ -788,34 +778,24 @@ public class Convolution2D: BN2D, LayerWeightInit deviceID: deviceID ) - let weightsPtr = _wBuffers.w_p!.shared.buffer - let biasesPtr = _bBuffers.w_p!.shared.buffer - - for elem in 0..( + _wDeltaWeights = FloatBuffer(nbElems: batchSize * nbWeights * weightWidth * weightHeight, deviceID: deviceID ) if _updateBiases { - _bDeltaWeights = MetalPrivateBuffer( + _bDeltaWeights = FloatBuffer(nbElems: batchSize * nbChannels, deviceID: deviceID ) } @@ -1076,11 +1056,8 @@ public class Convolution2D: BN2D, LayerWeightInit }} } - MetalKernel.get.download([_wBuffers.w_p!, _bBuffers.w_p!]) - MetalKernel.get.download([layerPrev.outs]) - - let weightsPtr = _wBuffers.w_p!.shared.buffer - let biasesPtr = _bBuffers.w_p!.shared.buffer + let weightsPtr = _wBuffers.w.download() + let biasesPtr = _bBuffers.w.download() let neuronsPrev = layerPrev.neurons let widthPrev = layerPrev.width @@ -1120,7 +1097,7 @@ public class Convolution2D: BN2D, LayerWeightInit }} }}} - let outsPrevPtr = layerPrev.outs.shared.buffer + let outsPrevPtr = layerPrev.outs.download() for batch in 0.. [IWeightArrays] { - var weights = [IWeightArrays]() + var weights = [WeightArrays]() weights += _wArrays if _updateBiases { @@ -1776,8 +1808,7 @@ public class Convolution2D: BN2D, LayerWeightInit } var deltaWeights = [T]() - MetalKernel.get.download([_wDeltaWeights]) - var deltaWeightsPtr = _wDeltaWeights.shared.buffer + var deltaWeightsPtr = _wDeltaWeights.download() let nbChannelsPrev = (self.layerPrev as! Layer2D).nbChannels let offsetStartGrid = @@ -1803,8 +1834,7 @@ public class Convolution2D: BN2D, LayerWeightInit if _updateBiases { - MetalKernel.get.download([_bDeltaWeights]) - deltaWeightsPtr = _bDeltaWeights.shared.buffer + deltaWeightsPtr = _bDeltaWeights.download() for depth in 0.., IWeightArrays class InputBuffers2D: InputBuffers, IWeightBuffers { /// Inputs buffer: the buffer to be update. - var w: MetalBuffer + var w: FloatBuffer { get { return _layer.outs @@ -90,7 +90,7 @@ class InputBuffers2D: InputBuffers, IWeightBuffers } /// Gradients buffer. - var g: MetalBuffer + var g: FloatBuffer { get { return _layer.delta @@ -230,6 +230,7 @@ public class Input2D: LayerInput2D, LayerResize, LayerUpdate /// - mapping: Dictionary allowing to find the layer associated to some id. /// This dictionary is particularly useful when the different layers cannot access /// their `layerPrev`. + /// - inPlace: Whether hard resources should be copied as is. /// /// - Returns: A new instance of `Layer`. When `inPlace` is false, `initKernel` is /// necessary in order to recreate hard resources. @@ -397,7 +398,7 @@ public class Input2D: LayerInput2D, LayerResize, LayerUpdate /// - width: Width of each channel. /// public func setDataGPU( - _ data: MetalPrivateBuffer, + _ data: FloatBuffer, batchSize: Int, nbChannels: Int, height: Int, width: Int) throws { @@ -449,14 +450,16 @@ public class Input2D: LayerInput2D, LayerResize, LayerUpdate let nbElems = outs.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] + let kernel = nbElems % 4 == 0 ? "sum14" : "sum1" + let coeff = nbElems % 4 == 0 ? 4 : 1 let command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID + kernel, deviceID: deviceID ) command.setBuffer(layerPrev.outs.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(outs.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() } } @@ -504,25 +507,25 @@ public class Input2D: LayerInput2D, LayerResize, LayerUpdate let nbElems = delta.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] - let command: MetalCommand + let kernel: String + let coeff = nbElems % 4 == 0 ? 4 : 1 if layerPrev.dirty { - command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum14" : "sum1" } else { - command = MetalKernel.get.createCommand( - "sum2", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum24" : "sum2" } + let command = MetalKernel.get.createCommand( + kernel, deviceID: deviceID + ) command.setBuffer(delta.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(layerPrev.delta.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() propagateDirty() diff --git a/Sources/GrAIdient/Layer2D/InstanceNorm2D.swift b/Sources/GrAIdient/Layer2D/InstanceNorm2D.swift index ce159f7e..1585cdb6 100644 --- a/Sources/GrAIdient/Layer2D/InstanceNorm2D.swift +++ b/Sources/GrAIdient/Layer2D/InstanceNorm2D.swift @@ -457,8 +457,7 @@ public class InstanceNorm2D: Activation2D, LayerUpdate, LayerWithActivation }}} }} - MetalKernel.get.download([layerPrev.outs]) - let outsPrevPtr = layerPrev.outs.shared.buffer + let outsPrevPtr = layerPrev.outs.download() // Prepare GC for norm weights: Ɣ and β. for batch in 0.. [IWeightArrays] { - var weights = [IWeightArrays]() + var weights = [WeightArrays]() if let norm = self.norm { weights += norm.collectWeights() diff --git a/Sources/GrAIdient/Layer2D/LayerCAM2D.swift b/Sources/GrAIdient/Layer2D/LayerCAM2D.swift new file mode 100644 index 00000000..3784df5f --- /dev/null +++ b/Sources/GrAIdient/Layer2D/LayerCAM2D.swift @@ -0,0 +1,217 @@ +// +// LayerCAM2D.swift +// GrAIdient +// +// Created by Jean-François Reboud on 10/02/2024. +// + +/// +/// Layer with a 2D shape neural structure. +/// +/// This layer creates a map of maximal activations with respect to the loss. +/// +public class LayerCAM2D: Layer2D +{ + /// Whether to take positive or negative part of gradients. + public var keepPositive: Bool = true + + private enum Keys: String, CodingKey + { + case keepPositive + } + + /// + /// Create a layer with a 2D shape neural structure. + /// + /// - Parameters: + /// - layerPrev: Previous layer that has been queued to the model. + /// - params: Contextual parameters linking to the model. + /// + public init(layerPrev: Layer2D, params: GrAI.Model.Params) throws + { + super.init(layerPrev: layerPrev, + nbChannels: 1, + height: layerPrev.height, + width: layerPrev.width, + params: params) + } + + /// + /// Decode from the disk. + /// + /// Throw an error if reading from the decoder fails, or + /// if the data read is corrupted or otherwise invalid. + /// + /// - Parameter decoder: The decoder to read data from. + /// + public required init(from decoder: Decoder) throws + { + let container = try decoder.container(keyedBy: Keys.self) + let keepPositive = try container.decode( + Bool.self, forKey: .keepPositive + ) + self.keepPositive = keepPositive + try super.init(from: decoder) + } + + /// + /// Encode to the disk. + /// + /// If the value fails to encode anything, `encoder` will encode an empty + /// keyed container in its place. + /// + /// Throw an error if any values are invalid for the given + /// encoder's format. + /// + /// - Parameter encoder: The encoder to write data to. + /// + public override func encode(to encoder: Encoder) throws + { + var container = encoder.container(keyedBy: Keys.self) + try container.encode(keepPositive, forKey: .keepPositive) + try super.encode(to: encoder) + } + + /// + /// Create a layer with same values as this. + /// + /// - Parameters: + /// - mapping: Dictionary allowing to find the layer associated to some id. + /// This dictionary is particularly useful when the different layers cannot access + /// their `layerPrev`. + /// - inPlace: Whether hard resources should be copied as is. + /// + /// - Returns: A new layer. When `inPlace` is false, `initKernel` is + /// necessary in order to recreate hard resources. + /// + public override func copy( + mapping: Dictionary, + inPlace: Bool) -> Layer + { + let context = ModelContext(name: "", curID: 0) + let layerPrev = mapping[idPrev] as! Layer2D + + let params = GrAI.Model.Params(context: context) + params.context.curID = id + + let layer = try! LayerCAM2D( + layerPrev: layerPrev, + params: params + ) + return layer + } + + /// + /// Apply the forward pass of the Gradient Checking in CPU execution context. + /// + /// Throw an error if batch size is greater than the first batch size. + /// + public override func forwardGCCPU() throws + { + fatalError("Not implemented.") + } + + /// + /// Apply the forward pass of the Gradient Checking in GPU execution context. + /// + /// Throw an error if batch size is greater than the first batch size. + /// + public override func forwardGCGPU() throws + { + try forwardGCCPU() + } + + /// + /// Apply the forward pass in the CPU execution context. + /// + /// Throw an error if batch size is greater than the first batch size. + /// + public override func forwardCPU() throws + { + if let layerPrev = self.layerPrev as? Layer2D + { + try checkStateCPU(batchSize: batchSize) + + let neuronsPrev = layerPrev.neurons + let nbChannelsPrev = layerPrev.nbChannels + + for elem in 0.., + _ groundTruth: FloatBuffer, batchSize: Int, nbChannels: Int, height: Int, width: Int) throws -> Float { @@ -296,9 +296,8 @@ public class MSE2D: LayerOutput2D command.dispatchThreads(batchSize) command.enqueue() - MetalKernel.get.download([loss]) var loss: Float = 0.0 - let lossPtr = self.loss.buffer + let lossPtr = self.loss.download() for i in 0.., + _ groundTruth: FloatBuffer, batchSize: Int, nbChannels: Int, height: Int, width: Int) throws { diff --git a/Sources/GrAIdient/Layer2D/Multiply2D.swift b/Sources/GrAIdient/Layer2D/Multiply2D.swift index d5d879ec..ca6c5448 100644 --- a/Sources/GrAIdient/Layer2D/Multiply2D.swift +++ b/Sources/GrAIdient/Layer2D/Multiply2D.swift @@ -14,10 +14,15 @@ public class Multiply2D: LayerMerge2D { /// - /// List of output buffers. + /// List of output buffers for CPU usage. /// Shape ~ (batch, nbChannels, height, width). /// - var _otherOuts: [MetalBuffer] = [] + var _otherOuts1: [[Double]] = [] + /// + /// List of output buffers for GPU usage. + /// Shape ~ (batch, nbChannels, height, width). + /// + var _otherOuts2: [FloatBuffer] = [] /// /// Create a layer with a 2D shape neural structure. @@ -80,7 +85,7 @@ public class Multiply2D: LayerMerge2D params.context.curID = id var layersPrev = [Layer2D]() - for idPrev in _idsPrev + for idPrev in idsPrev { layersPrev.append(mapping[idPrev] as! Layer2D) } @@ -97,7 +102,7 @@ public class Multiply2D: LayerMerge2D public override func resetKernelCPU() { super.resetKernelCPU() - _otherOuts = [] + _otherOuts1 = [] } /// @@ -108,7 +113,7 @@ public class Multiply2D: LayerMerge2D public override func resetKernelGPU() { super.resetKernelGPU() - _otherOuts = [] + _otherOuts2 = [] } /// @@ -120,17 +125,17 @@ public class Multiply2D: LayerMerge2D { try super.checkStateCPU(batchSize: batchSize) - if _otherOuts.count == 0 + if phase != nil && (phase == .Training || phase == .InferenceBackward) { + if _otherOuts1.count == 0 { - for _ in 0..<_layersPrev.count + for _ in 0..( - batchSize * nbChannels * height * width, - deviceID: deviceID - ) - _otherOuts.append(buffer) + _otherOuts1.append([Double]( + repeating: 0.0, + count: batchSize * nbChannels * height * width + )) } - } + }} } /// @@ -142,17 +147,18 @@ public class Multiply2D: LayerMerge2D { try super.checkStateForwardGPU(batchSize: batchSize) - if _otherOuts.count == 0 + if phase != nil && (phase == .Training || phase == .InferenceBackward) { + if _otherOuts2.count == 0 { - for _ in 0..<_layersPrev.count + for _ in 0..( - batchSize * nbChannels * height * width, + let buffer = FloatBuffer( + nbElems: batchSize * nbChannels * height * width, deviceID: deviceID ) - _otherOuts.append(buffer) + _otherOuts2.append(buffer) } - } + }} } /// @@ -192,9 +198,9 @@ public class Multiply2D: LayerMerge2D for j in 0..! = nil + private var _squaredNorm: FloatBuffer! = nil /// /// Temporary delta buffer used in the GPU execution context. /// Shape ~ (batch, nbThreadgroups). /// - private var _deltaTmp: MetalPrivateBuffer! = nil + private var _deltaTmp: FloatBuffer! = nil /// Number of thread groups in the GPU execution context. var nbThreadgroups: Int @@ -404,7 +404,7 @@ public class Normalize122D: Layer2D { if _squaredNorm == nil { - _squaredNorm = MetalPrivateBuffer( + _squaredNorm = FloatBuffer(nbElems: batchSize * nbThreadgroups, deviceID: deviceID ) } @@ -418,13 +418,16 @@ public class Normalize122D: Layer2D /// public override func checkStateBackwardGPU(batchSize: Int) throws { - if _deltaTmp == nil + if computeDelta { - _deltaTmp = MetalPrivateBuffer( - batchSize * nbThreadgroups, deviceID: deviceID - ) + if _deltaTmp == nil + { + _deltaTmp = FloatBuffer(nbElems: + batchSize * nbThreadgroups, deviceID: deviceID + ) + } + try super.checkStateBackwardGPU(batchSize: batchSize) } - try super.checkStateBackwardGPU(batchSize: batchSize) } /// @@ -570,7 +573,7 @@ public class Normalize122D: Layer2D command.enqueue() // Continue the reduction in a more generic way. - reduce( + reduceSum( inBuffer: _squaredNorm.metal, outBuffer: _squaredNorm.metal, dim1: nbThreadgroups, dim2: batchSize, @@ -725,7 +728,7 @@ public class Normalize122D: Layer2D command.enqueue() // Continue the reduction in a more generic way. - reduce( + reduceSum( inBuffer: _deltaTmp.metal, outBuffer: _deltaTmp.metal, dim1: nbThreadgroups, dim2: batchSize, diff --git a/Sources/GrAIdient/Layer2D/SelectNeurons2D.swift b/Sources/GrAIdient/Layer2D/SelectNeurons2D.swift index 58cf4a18..46846dbc 100644 --- a/Sources/GrAIdient/Layer2D/SelectNeurons2D.swift +++ b/Sources/GrAIdient/Layer2D/SelectNeurons2D.swift @@ -157,6 +157,7 @@ public class SelectNeurons2D: Layer1D, LayerResize /// - mapping: Dictionary allowing to find the layer associated to some id. /// This dictionary is particularly useful when the different layers cannot access /// their `layerPrev`. + /// - inPlace: Whether hard resources should be copied as is. /// /// - Returns: A new instance of `Layer`. When `inPlace` is false, `initKernel` is /// necessary in order to recreate hard resources. diff --git a/Sources/GrAIdient/Layer2D/SimilarityBatchError2D.swift b/Sources/GrAIdient/Layer2D/SimilarityBatchError2D.swift index f341e429..a93b2c9e 100644 --- a/Sources/GrAIdient/Layer2D/SimilarityBatchError2D.swift +++ b/Sources/GrAIdient/Layer2D/SimilarityBatchError2D.swift @@ -126,7 +126,7 @@ public class SimilarityBatchError2D: LayerOutput2D /// - width: Width of each channel. /// public override func checkGroundTruthGPU( - _ groundTruth: MetalBuffer, + _ groundTruth: FloatBuffer, batchSize: Int, nbChannels: Int, height: Int, width: Int) throws { @@ -144,9 +144,10 @@ public class SimilarityBatchError2D: LayerOutput2D { if loss == nil { - loss = MetalSharedBuffer( - batchSize * batchSize, - deviceID: deviceID + loss = FloatBuffer( + nbElems: batchSize * batchSize, + deviceID: deviceID, + shared: true ) } else if batchSize <= 0 || batchSize * batchSize > loss.nbElems @@ -259,9 +260,8 @@ public class SimilarityBatchError2D: LayerOutput2D command.dispatchThreads(width: batchSize, height: batchSize) command.enqueue() - MetalKernel.get.download([loss]) var loss: Float = 0.0 - let lossPtr = self.loss.buffer + let lossPtr = self.loss.download() for elem1 in 0..! = nil + public internal(set) var loss: FloatBuffer! = nil /// Batch size sum in the previous layers. public var mergedBatchSize: Int { get { var sum = 0 - for layerPrev in _layersPrev + for layerPrev in layersPrev { sum += layerPrev.batchSize } @@ -127,7 +127,7 @@ public class SimilarityError2D: LayerMerge2D params.context.curID = id var layersPrev = [Layer2D]() - for idPrev in _idsPrev + for idPrev in idsPrev { layersPrev.append(mapping[idPrev] as! Layer2D) } @@ -151,9 +151,10 @@ public class SimilarityError2D: LayerMerge2D { if loss == nil { - loss = MetalSharedBuffer( - batchSize * batchSize, - deviceID: deviceID + loss = FloatBuffer( + nbElems: batchSize * batchSize, + deviceID: deviceID, + shared: true ) } else if batchSize <= 0 || batchSize * batchSize > loss.nbElems @@ -189,10 +190,10 @@ public class SimilarityError2D: LayerMerge2D }} var curElem = 0 - for num in 0..<_layersPrev.count + for num in 0..! = nil + var _wDeltaWeights: FloatBuffer! = nil /// Whether to compute weights' gradients or not. public var computeDeltaWeights: Bool = true @@ -102,12 +103,7 @@ public class VQ2D: LayerOutput2D, LayerWeightInit { return _weightsList } - - var weightsTmp = [Float]() - MetalKernel.get.download([_wBuffers.w_p!]) - weightsTmp += _wBuffers.w_p!.shared.array - - return weightsTmp + return _wBuffers.w.download() } set { _weightsList = newValue @@ -308,24 +304,21 @@ public class VQ2D: LayerOutput2D, LayerWeightInit /// public func initWeightsGPU() { - if _weightsList.count == 0 - { - _weightsList = generateWeightsList() - } - _wBuffers = WeightBuffers( nbElems: K * nbChannels, deviceID: deviceID ) - let weightsPtr = _wBuffers.w_p!.shared.buffer - for elem in 0..( + _wDeltaWeights = FloatBuffer(nbElems: batchSize * K * nbChannels, deviceID: deviceID ) } @@ -429,7 +422,7 @@ public class VQ2D: LayerOutput2D, LayerWeightInit /// - width: Width of each channel. /// public override func checkGroundTruthGPU( - _ groundTruth: MetalBuffer, + _ groundTruth: FloatBuffer, batchSize: Int, nbChannels: Int, height: Int, width: Int) throws { @@ -552,7 +545,7 @@ public class VQ2D: LayerOutput2D, LayerWeightInit _backwardWeightsCPU() } - private func _backwardCPU() + fileprivate func _backwardCPU() { if let layerPrev = self.layerPrev as? Layer2D, mustComputeBackward { @@ -564,6 +557,7 @@ public class VQ2D: LayerOutput2D, LayerWeightInit for j in 0..= 0 { for depth in 0..= 0 { for depth in 0..).buffer for elem in 0..= 0 { - let outPrev = neuronsPrev[depth].get(i, j)!.v[elem].out - let vq = neurons[depth].get(i, j)!.v[elem].out - value += pow(outPrev - vq, 2.0) + var value: Double = 0.0 + for depth in 0.., + inPlace: Bool) -> Layer + { + let context = ModelContext(name: "", curID: 0) + let layerPrev = mapping[idPrev] as! Layer2D + + let params = GrAI.Model.Params(context: context) + params.context.curID = id + + let layer = VQGrad2D( + layerPrev: layerPrev, K: K, params: params + ) + layer.magnitudeCoeff = magnitudeCoeff + layer.coeff = coeff + layer.beta = beta + + if inPlace + { + layer._wArrays = _wArrays + layer._wBuffers = _wBuffers + } + else + { + if GrAI.Opti.GPU + { + layer.weightsGPU = weightsGPU + } + else + { + layer.weightsCPU = weightsCPU + } + } + return layer + } + + /// + /// Find the `layerPrev` associated to the layer's `idPrev`. + /// + /// - Parameter layers: The potential layers where to find the layer's `idPrev`. + /// + public override func initLinks(_ layers: [Layer]) + { + super.initLinks(layers) + _layerCAM.initLinks(layers) + } + + /// + /// Clean state resources in the GPU execution context. + /// + /// We first clean the neurons' state (forward and backward). + /// We do not clean weights and biases but must reset their delta (dependent on batch size) and + /// momentum state. + /// + public override func resetKernelGPU() + { + super.resetKernelGPU() + _layerCAM.resetKernelGPU() + _camMax = nil + } + + /// + /// Initialize state resources in the CPU execution context. + /// + /// We initialize the neurons' state (forward and backward). + /// + public override func checkStateCPU(batchSize: Int) throws + { + try super.checkStateCPU(batchSize: batchSize) + try _layerCAM.checkStateCPU(batchSize: batchSize) + } + + /// + /// Initialize state resources in the GPU execution context. + /// + /// We initialize the neurons' forward state. + /// We initialize the weights and biases' delta. + /// + public override func checkStateForwardGPU(batchSize: Int) throws + { + try super.checkStateForwardGPU(batchSize: batchSize) + try _layerCAM.checkStateForwardGPU(batchSize: batchSize) + + if _camMax == nil + { + _camMax = FloatBuffer(nbElems: + batchSize * nbThreadgroups, + deviceID: deviceID + ) + } + } + + /// + /// Initialize state resources in the GPU execution context. + /// + /// We initialize the neurons' backward state. + /// + public override func checkStateBackwardGPU(batchSize: Int) throws + { + try super.checkStateBackwardGPU(batchSize: batchSize) + try _layerCAM.checkStateBackwardGPU(batchSize: batchSize) + } + + /// + /// Apply the forward pass in the CPU execution context. + /// + /// Throw an error if batch size is greater than the first batch size. + /// + public override func forwardCPU() throws + { + if let layerPrev = self.layerPrev as? Layer2D + { + if layerPrev.dirty + { + throw UpdateError.Dirty + } + + try _layerCAM.forwardCPU() + let neuronsCAM = _layerCAM.neurons + + try checkStateCPU(batchSize: batchSize) + + let neuronsPrev = layerPrev.neurons + let indicesPtr = (indices as! MetalSharedBuffer).buffer + + for elem in 0..= magnitudeCoeff + { + var minIndex = -1 + var minValue: Double? = nil + + for k in 0..! = nil + var tmp: FloatBuffer! = nil /// Get coefficient (depending on activation function) to apply during the weights initialization. public var coeffInitWeights: Float @@ -160,7 +160,7 @@ public class ActivationSeq: LayerSeq public override func resetKernelGPU() { super.resetKernelGPU() - _tmp = nil + tmp = nil } /// @@ -259,14 +259,16 @@ public class ActivationSeq: LayerSeq let nbElems = outs.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] + let kernel = nbElems % 4 == 0 ? "sum14" : "sum1" + let coeff = nbElems % 4 == 0 ? 4 : 1 let command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID + kernel, deviceID: deviceID ) command.setBuffer(layerPrev.outs.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(outs.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() _activation!.forwardGPU(self) @@ -318,24 +320,25 @@ public class ActivationSeq: LayerSeq let nbElems = delta.nbElems let pNbElems: [UInt32] = [UInt32(nbElems)] - let command: MetalCommand + let kernel: String + let coeff = nbElems % 4 == 0 ? 4 : 1 if layerPrev.dirty { - command = MetalKernel.get.createCommand( - "sum1", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum14" : "sum1" } else { - command = MetalKernel.get.createCommand( - "sum2", deviceID: deviceID - ) + kernel = nbElems % 4 == 0 ? "sum24" : "sum2" } + let command = MetalKernel.get.createCommand( + kernel, deviceID: deviceID + ) + command.setBuffer(delta.metal, atIndex: 0) command.setBytes(pNbElems, atIndex: 1) command.setBuffer(layerPrev.delta.metal, atIndex: 2) - command.dispatchThreads(nbElems) + command.dispatchThreads(nbElems / coeff) command.enqueue() propagateDirty() diff --git a/Sources/GrAIdient/LayerSeq/Base/LayerMergeSeq.swift b/Sources/GrAIdient/LayerSeq/Base/LayerMergeSeq.swift index 26a5d95f..5bc591c8 100644 --- a/Sources/GrAIdient/LayerSeq/Base/LayerMergeSeq.swift +++ b/Sources/GrAIdient/LayerSeq/Base/LayerMergeSeq.swift @@ -9,15 +9,15 @@ public class LayerMergeSeq: LayerSeq { /// List of links to the previous layers in the model. - var _layersPrev = [Layer]() + public var layersPrev = [Layer]() /// List of identifiers of the previous layers in the model. - let _idsPrev: [Int] + public let idsPrev: [Int] /// Whether backward pass should continue backward or not. public override var mustComputeBackward: Bool { get { - for layerPrev in _layersPrev + for layerPrev in layersPrev { if layerPrev.computeDelta { @@ -52,7 +52,7 @@ public class LayerMergeSeq: LayerSeq { idsPrev.append(layer.id) } - _idsPrev = idsPrev + self.idsPrev = idsPrev super.init(layerPrev: layersPrev[0], sequence: sequence, @@ -71,7 +71,7 @@ public class LayerMergeSeq: LayerSeq public required init(from decoder: Decoder) throws { let container = try decoder.container(keyedBy: Keys.self) - _idsPrev = try container.decode([Int].self, forKey: .idsPrev) + idsPrev = try container.decode([Int].self, forKey: .idsPrev) try super.init(from: decoder) } @@ -89,7 +89,7 @@ public class LayerMergeSeq: LayerSeq public override func encode(to encoder: Encoder) throws { var container = encoder.container(keyedBy: Keys.self) - try container.encode(_idsPrev, forKey: .idsPrev) + try container.encode(idsPrev, forKey: .idsPrev) try super.encode(to: encoder) } @@ -100,14 +100,14 @@ public class LayerMergeSeq: LayerSeq /// public override func initLinks(_ layers: [Layer]) { - _layersPrev = [Layer]() - for id in _idsPrev + self.layersPrev = [Layer]() + for id in idsPrev { for testLayer in layers { if testLayer.id == id { - _layersPrev.append(testLayer) + self.layersPrev.append(testLayer) break } } @@ -121,9 +121,9 @@ public class LayerMergeSeq: LayerSeq /// public override func propagateDirty(_ dirty: Bool = false) { - for num in 0..<_layersPrev.count + for num in 0.. ([Layer], [Int]) { var layersBranches = [Layer?]() - for layer in _layersPrev + for layer in layersPrev { layersBranches.append(layer) } @@ -237,7 +237,7 @@ public class LayerMergeSeq: LayerSeq var nbElems = [Int]() var nbLastElems = [Int](repeating: nbSameElems, - count: _layersPrev.count) + count: layersPrev.count) for (index, layer) in zip(layersIndex, layersMerged) { let nbElemsTmp = layer.nbGC diff --git a/Sources/GrAIdient/LayerSeq/Base/LayerSeq.swift b/Sources/GrAIdient/LayerSeq/Base/LayerSeq.swift index 0a79d55d..07487763 100644 --- a/Sources/GrAIdient/LayerSeq/Base/LayerSeq.swift +++ b/Sources/GrAIdient/LayerSeq/Base/LayerSeq.swift @@ -15,15 +15,15 @@ open class LayerSeq: Layer /// Output buffer (result of the forward pass) used in the GPU execution context. /// Shape ~ (batch, seq, nbNeurons). /// - public var outs: MetalPrivateBuffer! = nil + public var outs: FloatBuffer! = nil /// /// Gradient buffer (result of the backward pass) used in the GPU execution context. /// Shape ~ (batch, seq, nbNeurons). /// - public var delta: MetalPrivateBuffer! = nil + public var delta: FloatBuffer! = nil /// Length of the sequence. - public let sequence: Int + public internal(set) var sequence: Int /// Number of neurons. public let nbNeurons: Int @@ -123,7 +123,7 @@ open class LayerSeq: Layer /// /// We initialize the neurons' state (forward and backward). /// - public func checkStateCPU(batchSize: Int) throws + public override func checkStateCPU(batchSize: Int) throws { if neurons == nil { @@ -144,12 +144,13 @@ open class LayerSeq: Layer /// /// We initialize the neurons' forward state. /// - public func checkStateForwardGPU(batchSize: Int) throws + public override func checkStateForwardGPU(batchSize: Int) throws { if outs == nil { - outs = MetalPrivateBuffer( - batchSize * sequence * nbNeurons, deviceID: deviceID + outs = FloatBuffer( + nbElems: batchSize * sequence * nbNeurons, + deviceID: deviceID ) } else if batchSize <= 0 || batchSize > outs.nbElems / nbNeurons @@ -163,18 +164,87 @@ open class LayerSeq: Layer /// /// We initialize the neurons' backward state. /// - public func checkStateBackwardGPU(batchSize: Int) throws + public override func checkStateBackwardGPU(batchSize: Int) throws { - if delta == nil + if computeDelta { - delta = MetalPrivateBuffer( - batchSize * sequence * nbNeurons, deviceID: deviceID - ) + if delta == nil + { + delta = FloatBuffer( + nbElems: batchSize * sequence * nbNeurons, + deviceID: deviceID + ) + } + else if batchSize <= 0 || + batchSize > delta.nbElems / (sequence * nbNeurons) + { + throw LayerError.BatchSize + } } - else if batchSize <= 0 || - batchSize > delta.nbElems / (sequence * nbNeurons) + } + + /// Get the outputs of this layer in the CPU execution context. + public func getOutsCPU() -> [T] + { + var outs = [T]() + for elem in 0..() -> [T] + { + return outs.download().map + { + T($0) + } + } + + /// + /// Get the delta of this layer in the CPU execution context. + /// + /// Throw an error when layer has not been updated through backward pass. + /// + public func getDeltaCPU() throws -> [T] + { + if dirty + { + throw UpdateError.Dirty + } + + var delta = [T]() + for elem in 0..() throws -> [T] + { + if dirty + { + throw UpdateError.Dirty + } + + return delta.download().map + { + T($0) } } } diff --git a/Sources/GrAIdient/LayerSeq/ConcatSeq.swift b/Sources/GrAIdient/LayerSeq/ConcatSeq.swift index fae570e4..f9720356 100644 --- a/Sources/GrAIdient/LayerSeq/ConcatSeq.swift +++ b/Sources/GrAIdient/LayerSeq/ConcatSeq.swift @@ -65,7 +65,7 @@ public class Concat1Seq: LayerMergeSeq params.context.curID = id var layersPrev = [LayerSeq]() - for idPrev in _idsPrev + for idPrev in idsPrev { layersPrev.append(mapping[idPrev] as! LayerSeq) } @@ -101,9 +101,9 @@ public class Concat1Seq: LayerMergeSeq for depth in 0..! = nil + var _wDeltaWeights: FloatBuffer! = nil /// Whether to compute weights' gradients or not. public var computeDeltaWeights: Bool = true @@ -557,12 +548,7 @@ public class Constant2Seq: LayerSeq, LayerUpdate { return _weightsList } - - var weightsTmp = [Float]() - MetalKernel.get.download([_wBuffers.w_p!]) - weightsTmp += _wBuffers.w_p!.shared.array - - return weightsTmp + return _wBuffers.w.download() } set { _weightsList = newValue @@ -754,24 +740,16 @@ public class Constant2Seq: LayerSeq, LayerUpdate deviceID: deviceID ) - let weightsPtr = _wBuffers.w_p!.shared.buffer - if _weightsList.count == 0 + if _weightsList.count != 0 { - for depth in 0..( + _wDeltaWeights = FloatBuffer(nbElems: batchSize * sequence * nbNeurons, deviceID: deviceID ) } @@ -856,8 +834,7 @@ public class Constant2Seq: LayerSeq, LayerUpdate ) }} - MetalKernel.get.download([_wBuffers.w_p!]) - let weightsPtr = _wBuffers.w_p!.shared.buffer + let weightsPtr = _wBuffers.w.download() for batch in 0..! = nil + + /// + /// Grid of weights. + /// Shape ~ (vocabularySize, nbNeurons). + /// + var _wArrays: WeightGrids! = nil + + /// + /// Buffer of weights. + /// Shape ~ (vocabularySize, nbNeurons). + /// + var _wBuffers: IWeightBuffers! = nil + + /// + /// Buffer of gradients per sample. + /// Shape ~ (batch, vocabularySize, nbNeurons). + /// + var _wDeltaWeights: FloatBuffer! = nil + + /// Whether to compute weights' gradients or not. + public var computeDeltaWeights: Bool = true + + /// Whether gradients of weights must be accumulated or not. + public var accumulateDeltaWeights: Bool = false + + /// Cache for weights before calling `initKernel` API. + var _weightsList = [Float]() + + /// Weights in the CPU execution context. + public var weightsCPU: [Float] + { + get { + if _wArrays == nil + { + return _weightsList + } + + var weightsTmp = [Float]() + for index in 0.., + inPlace: Bool) -> Layer + { + if idPrev > -1 + { + fatalError("EmbeddingSeq must be the first layer.") + } + + let context = ModelContext(name: "", curID: 0) + let params = GrAI.Model.Params(context: context) + params.context.curID = id + + let layer = EmbeddingSeq( + sequence: sequence, + vocabularySize: vocabularySize, + nbNeurons: nbNeurons, + params: params + ) + + if inPlace + { + layer._wArrays = _wArrays + layer._wBuffers = _wBuffers + } + else + { + if GrAI.Opti.GPU + { + layer.weightsGPU = weightsGPU + } + else + { + layer.weightsCPU = weightsCPU + } + } + return layer + } + + /// + /// Clean state resources in the CPU execution context. + /// + /// We first clean the neurons' state (forward and backward). + /// We do not clean weights and biases but must reset their delta (dependent on batch size) and + /// momentum state. + /// + public override func resetKernelCPU() + { + super.resetKernelCPU() + _wArrays?.reset() + ins = nil + } + + /// + /// Clean state resources in the GPU execution context. + /// + /// We first clean the neurons' state (forward and backward). + /// We do not clean weights and biases but must reset their delta (dependent on batch size) and + /// momentum state. + /// + public override func resetKernelGPU() + { + super.resetKernelGPU() + + ins = nil + _wDeltaWeights = nil + _wBuffers?.reset() + } + + /// + /// Initialize weights in the CPU execution context. + /// + /// Their momentum and delta state are also reset. + /// + public func initWeightsCPU() + { + if _weightsList.count == 0 + { + _weightsList = generateWeightsList() + } + + _wArrays = WeightGrids(width: nbNeurons, height: vocabularySize) + + for index in 0..( + batchSize * sequence, deviceID: deviceID + ) + } + else if batchSize <= 0 || batchSize > ins.nbElems / sequence + { + throw LayerError.BatchSize + } + + var dataFlat = data.flatMap { $0.map { Int32($0)} } + let ins_s = ins as! MetalSharedBuffer + copyArrayToBuffer( + array: &dataFlat, + buffer: ins_s.buffer, + start: 0, + nbElems: batchSize * sequence + ) + } + + /// + /// Check and setup input in the GPU execution context. + /// + /// Throw an error if data size is not coherent. + /// + /// - Parameters: + /// - data: The input data. + /// - batchSize: The batch size of data. + /// - sequence: Length of the sequence. + /// + public func checkInputGPU( + _ data: [[Int]], + batchSize: Int, + sequence: Int) throws + { + if data.count != batchSize || data.first!.count != sequence + { + throw LayerError.DataSize + } + + if ins == nil + { + ins = MetalPrivateBuffer( + batchSize * sequence, deviceID: deviceID + ) + } + else if batchSize <= 0 || batchSize > ins.nbElems / sequence + { + throw LayerError.BatchSize + } + + // Wait for previous loop to end to avoid race condition. + _ = ins.download() + + var dataFlat = data.flatMap { $0.map { Int32($0)} } + let ins_s = ins as! MetalPrivateBuffer + copyArrayToBuffer( + array: &dataFlat, + buffer: ins_s.shared.buffer, + start: 0, + nbElems: batchSize * sequence + ) + ins.upload() + } + + /// + /// API to set data in the CPU execution context. + /// + /// Throw an error if data size is not coherent. + /// + /// - Parameters: + /// - data: The data to set. + /// - batchSize: The batch size of data. + /// - sequence: Length of the sequence. + /// + public func setDataCPU( + _ data: [[Int]], + batchSize: Int, + sequence: Int) throws + { + try checkInputCPU( + data, + batchSize: batchSize, + sequence: sequence + ) + } + + /// + /// API to set data in the GPU execution context. + /// + /// Throw an error if data size is not coherent. + /// + /// - Parameters: + /// - data: The data to set. + /// - batchSize: The batch size of data. + /// - sequence: Length of the sequence. + /// + public func setDataGPU( + _ data: [[Int]], + batchSize: Int, + sequence: Int) throws + { + try checkInputGPU( + data, + batchSize: batchSize, + sequence: sequence + ) + } + + /// + /// Initialize state resources in the GPU execution context. + /// + /// We initialize the neurons' forward state. + /// We initialize the weights and biases' delta. + /// + public override func checkStateForwardGPU(batchSize: Int) throws + { + try super.checkStateForwardGPU(batchSize: batchSize) + + if computeDeltaWeights && + GrAI.Gradient.sample && _wDeltaWeights == nil + { + _wDeltaWeights = FloatBuffer(nbElems: + batchSize * vocabularySize * nbNeurons, deviceID: deviceID + ) + } + } + + /// + /// Apply the forward pass of the Gradient Checking in CPU execution context. + /// + /// Throw an error if batch size is greater than the first batch size. + /// + public override func forwardGCCPU() throws + { + try checkStateCPU(batchSize: batchSize) + + let newGC = 2 * nbLearnedGC + for seq in 0..).buffer + + for batch in 0..).buffer + + for elem in 0..).buffer + + if !accumulateDeltaWeights + { + for index in 0..= vocabularySize + { + fatalError("Index \(index) is out of range.") + } + for depth in 0.. [IWeightArrays] + { + return [_wArrays] + } + + /// Get the weights in the GPU execution context. + public func collectWeightsGPU() -> [IWeightBuffers] + { + return [_wBuffers] + } +} diff --git a/Sources/GrAIdient/LayerSeq/FullyConnectedPatch.swift b/Sources/GrAIdient/LayerSeq/FullyConnectedPatch.swift index 9ed2b6ce..c9bf8ba5 100644 --- a/Sources/GrAIdient/LayerSeq/FullyConnectedPatch.swift +++ b/Sources/GrAIdient/LayerSeq/FullyConnectedPatch.swift @@ -45,12 +45,12 @@ public class FullyConnectedPatch: ActivationSeq, /// Buffer of gradients per sample for weights. /// Shape ~ (batch, nbNeurons, nbNeuronsPrev x patch x patch). /// - var _wDeltaWeights: MetalPrivateBuffer! = nil + var _wDeltaWeights: FloatBuffer! = nil /// - /// Buffer of gradients per sample for biases. + /// Buffer of gradients per sample. /// Shape ~ (batch, nbNeurons). /// - var _bDeltaWeights: MetalPrivateBuffer! = nil + var _bDeltaWeights: FloatBuffer! = nil /// Whether to compute weights' gradients or not. public var computeDeltaWeights: Bool = true @@ -106,14 +106,10 @@ public class FullyConnectedPatch: ActivationSeq, return _weightsList } - var weightsTmp = [Float]() - MetalKernel.get.download([_wBuffers.w_p!]) - weightsTmp += _wBuffers.w_p!.shared.array - + var weightsTmp = _wBuffers.w.download() if _updateBiases { - MetalKernel.get.download([_bBuffers.w_p!]) - weightsTmp += _bBuffers.w_p!.shared.array + weightsTmp += _bBuffers.w.download() } return weightsTmp } @@ -458,12 +454,6 @@ public class FullyConnectedPatch: ActivationSeq, /// public func initWeightsGPU() { - if _weightsList.count == 0 - { - _weightsList = generateWeightsList() - _weightsList += [Float](repeating: 0.0, count: weightHeight) - } - _wBuffers = WeightBuffers( nbElems: weightHeight * weightWidth, deviceID: deviceID @@ -473,33 +463,24 @@ public class FullyConnectedPatch: ActivationSeq, deviceID: deviceID ) - let weightsPtr = _wBuffers.w_p!.shared.buffer - let biasesPtr = _bBuffers.w_p!.shared.buffer - - for elem in 0..( + _wDeltaWeights = FloatBuffer(nbElems: batchSize * sequence * nbNeurons * weightWidth, deviceID: deviceID ) if _updateBiases { - _bDeltaWeights = MetalPrivateBuffer( + _bDeltaWeights = FloatBuffer(nbElems: batchSize * sequence * nbNeurons, deviceID: deviceID ) } @@ -720,11 +701,8 @@ public class FullyConnectedPatch: ActivationSeq, ) }} - MetalKernel.get.download([_wBuffers.w_p!, _bBuffers.w_p!]) - MetalKernel.get.download([layerPrev.outs]) - - let weightsPtr = _wBuffers.w_p!.shared.buffer - let biasesPtr = _bBuffers.w_p!.shared.buffer + let weightsPtr = _wBuffers.w.download() + let biasesPtr = _bBuffers.w.download() let nbSeqPerCol = layerPrev.width / _patch let neuronsPrev = layerPrev.neurons @@ -762,7 +740,7 @@ public class FullyConnectedPatch: ActivationSeq, } }}} - let outsPrevPtr = layerPrev.outs.shared.buffer + let outsPrevPtr = layerPrev.outs.download() for batch in 0.. [IWeightArrays] { - var weights = [IWeightArrays]() + var weights = [WeightArrays]() weights.append(_wArrays) if _updateBiases { @@ -1327,8 +1308,7 @@ public class FullyConnectedPatch: ActivationSeq, } var deltaWeights = [T]() - MetalKernel.get.download([_wDeltaWeights]) - var deltaWeightsPtr = _wDeltaWeights.shared.buffer + var deltaWeightsPtr = _wDeltaWeights.download() let offsetStart = elem * nbNeurons * weightWidth for depth in 0..! = nil + var _wDeltaWeights: FloatBuffer! = nil /// - /// Buffer of gradients per sample for biases. + /// Buffer of gradients per sample. /// Shape ~ (batch, nbNeurons). /// - var _bDeltaWeights: MetalPrivateBuffer! = nil + var _bDeltaWeights: FloatBuffer! = nil /// Whether to compute weights' gradients or not. public var computeDeltaWeights: Bool = true @@ -98,14 +98,10 @@ public class FullyConnectedSeq: ActivationSeq, return _weightsList } - var weightsTmp = [Float]() - MetalKernel.get.download([_wBuffers.w_p!]) - weightsTmp += _wBuffers.w_p!.shared.array - + var weightsTmp = _wBuffers.w.download() if _updateBiases { - MetalKernel.get.download([_bBuffers.w_p!]) - weightsTmp += _bBuffers.w_p!.shared.array + weightsTmp += _bBuffers.w.download() } return weightsTmp } @@ -433,12 +429,6 @@ public class FullyConnectedSeq: ActivationSeq, /// public func initWeightsGPU() { - if _weightsList.count == 0 - { - _weightsList = generateWeightsList() - _weightsList += [Float](repeating: 0.0, count: weightHeight) - } - _wBuffers = WeightBuffers( nbElems: weightHeight * weightWidth, deviceID: deviceID @@ -448,34 +438,24 @@ public class FullyConnectedSeq: ActivationSeq, deviceID: deviceID ) - let weightsPtr = _wBuffers.w_p!.shared.buffer - let biasesPtr = _bBuffers.w_p!.shared.buffer - - for elem in 0..( + _wDeltaWeights = FloatBuffer(nbElems: batchSize * sequence * nbNeurons * weightWidth, deviceID: deviceID ) if _updateBiases { - _bDeltaWeights = MetalPrivateBuffer( + _bDeltaWeights = FloatBuffer(nbElems: batchSize * sequence * nbNeurons, deviceID: deviceID ) } @@ -661,11 +641,8 @@ public class FullyConnectedSeq: ActivationSeq, ) }} - MetalKernel.get.download([_wBuffers.w_p!, _bBuffers.w_p!]) - MetalKernel.get.download([layerPrev.outs]) - - let weightsPtr = _wBuffers.w_p!.shared.buffer - let biasesPtr = _bBuffers.w_p!.shared.buffer + let weightsPtr = _wBuffers.w.download() + let biasesPtr = _bBuffers.w.download() let neuronsPrev = layerPrev.neurons! let nbNeuronsPrev = layerPrev.nbNeurons @@ -690,7 +667,7 @@ public class FullyConnectedSeq: ActivationSeq, } }}} - let outsPrevPtr = layerPrev.outs.shared.buffer + let outsPrevPtr = layerPrev.outs.download() for batch in 0.. [IWeightArrays] { - var weights = [IWeightArrays]() + var weights = [WeightArrays]() weights.append(_wArrays) if _updateBiases { @@ -1170,8 +1192,7 @@ public class FullyConnectedSeq: ActivationSeq, } var deltaWeights = [T]() - MetalKernel.get.download([_wDeltaWeights]) - var deltaWeightsPtr = _wDeltaWeights.shared.buffer + var deltaWeightsPtr = _wDeltaWeights.download() let offsetStart = elem * nbNeurons * weightWidth for depth in 0.., + inPlace: Bool) -> Layer + { + let context = ModelContext(name: "", curID: 0) + let layerPrev = mapping[idPrev] as! LayerSeq + + let params = GrAI.Model.Params(context: context) + params.context.curID = id + + let layer = try! LayerCAMSeq( + layerPrev: layerPrev, + params: params + ) + return layer + } + + /// + /// Apply the forward pass of the Gradient Checking in CPU execution context. + /// + /// Throw an error if batch size is greater than the first batch size. + /// + public override func forwardGCCPU() throws + { + fatalError("Not implemented.") + } + + /// + /// Apply the forward pass of the Gradient Checking in GPU execution context. + /// + /// Throw an error if batch size is greater than the first batch size. + /// + public override func forwardGCGPU() throws + { + try forwardGCCPU() + } + + /// + /// Apply the forward pass in the CPU execution context. + /// + /// Throw an error if batch size is greater than the first batch size. + /// + public override func forwardCPU() throws + { + if let layerPrev = self.layerPrev as? LayerSeq + { + try checkStateCPU(batchSize: batchSize) + + let neuronsPrev = layerPrev.neurons! + let nbNeuronsPrev = layerPrev.nbNeurons + + for elem in 0.., + inPlace: Bool) -> Layer + { + let context = ModelContext(name: "", curID: 0) + let params = GrAI.Model.Params(context: context) + params.context.curID = id + + var layersPrev = [LayerSeq]() + for idPrev in idsPrev + { + layersPrev.append(mapping[idPrev] as! LayerSeq) + } + + let layer = try! MultiplySeq(layersPrev: layersPrev, params: params) + return layer + } + + /// + /// Clean state resources in the CPU execution context. + /// + /// We clean the neurons' state (forward and backward). + /// + public override func resetKernelCPU() + { + super.resetKernelCPU() + _otherOuts1 = [] + } + + /// + /// Clean state resources in the GPU execution context. + /// + /// We clean the neurons' state (forward and backward). + /// + public override func resetKernelGPU() + { + super.resetKernelGPU() + _otherOuts2 = [] + } + + /// + /// Initialize state resources in the CPU execution context. + /// + /// We initialize the neurons' state (forward and backward). + /// + public override func checkStateCPU(batchSize: Int) throws + { + try super.checkStateCPU(batchSize: batchSize) + + if phase != nil && (phase == .Training || phase == .InferenceBackward) { + if _otherOuts1.count == 0 + { + for _ in 0.., + inPlace: Bool) -> Layer + { + let context = ModelContext(name: "", curID: 0) + let layerPrev = mapping[idPrev] as! LayerSeq + + let params = GrAI.Model.Params(context: context) + params.context.curID = id + + let layer = try! QuerySelfSeq( + layerPrev: layerPrev, + query: _queryOffset, + key: _keyOffset, + nbBlocksPrev: _nbBlocksPrev, + nbHeads: _nbHeads, + params: params + ) + return layer + } + + /// + /// Apply the forward pass of the Gradient Checking in CPU execution context. + /// + /// Throw an error if batch size is greater than the first batch size. + /// + public override func forwardGCCPU() throws + { + if let layerPrev = self.layerPrev as? LayerSeq + { + try checkStateCPU(batchSize: batchSize) + + let nbGC = layerPrev.nbGC + for seqQ in 0..