This project combines speech recognition using the Whisper model and Sentis with LLM inference using LLMUnity and the Google Gemma2 2b model, all running on-device in Unity.
- Unity 2022.3.39 (recommended, but not required)
- Git LFS
-
Do not download as a ZIP file. Instead, use Git to clone the repository:
git clone https://github.com/ali7919/Talk-With-LLM-In-Unity.git
-
Open the project in Unity (preferably version 2022.3.39).
-
Open the
Scenes/scene
file. -
Click on "Import TMP Essentials" if prompted.
- Download the Gemma 2 2b-it (8-bit quantized version) or any other LLM with .gguf format.
- Place the downloaded LLM file in the
StreamingAssets
folder. - In the Unity scene, select the
LLM
GameObject and ensure the correct model is selected.
- Download LogMelSepctro.onnx , AudioEncoder_Tiny.onnx and AudioDecoder_Tiny.onnx from here and vocab.json from here
- Place
vocab.json
in theStreamingAssets
folder. - Place the ONNX models in the
SentisModels
folder. - In the Unity scene, select the
Sentis-Whisper
GameObject and assign each model as required.
Run the game You have two options for input:
Type your message in the input field. Press Enter to send the message.
Click on the microphone icon to start recording. Speak your message. Click the microphone icon again when you're done speaking to send the message.
The application will process your input (either text or speech) and generate a response using the LLM.
This project was developed with the help of the following resources:
-
Voice Recognition: The implementation is partially based on the tutorial by Thomas Simonini: Building AI-Driven Voice Recognition
-
LLM Inference: The inferencing part of this project is based on a sample provided by LLMUnity: LLMUnity ChatBot Sample
I recommend checking out these resources for more in-depth understanding of the underlying technologies and implementations.