Skip to content

Latest commit

 

History

History
82 lines (44 loc) · 2.86 KB

README.md

File metadata and controls

82 lines (44 loc) · 2.86 KB

Speech Recognition and LLM Inference for Unity

TalkWithLLM

This project combines speech recognition using the Whisper model and Sentis with LLM inference using LLMUnity and the Google Gemma2 2b model, all running on-device in Unity.

Project Setup

Prerequisites

  • Unity 2022.3.39 (recommended, but not required)
  • Git LFS

Installation

  1. Do not download as a ZIP file. Instead, use Git to clone the repository:

    git clone https://github.com/ali7919/Talk-With-LLM-In-Unity.git
    
  2. Open the project in Unity (preferably version 2022.3.39).

  3. Open the Scenes/scene file.

  4. Click on "Import TMP Essentials" if prompted.

Configuration

  1. Download the Gemma 2 2b-it (8-bit quantized version) or any other LLM with .gguf format.
  2. Place the downloaded LLM file in the StreamingAssets folder.
  3. In the Unity scene, select the LLM GameObject and ensure the correct model is selected.

image

  1. Download LogMelSepctro.onnx , AudioEncoder_Tiny.onnx and AudioDecoder_Tiny.onnx from here and vocab.json from here
  2. Place vocab.json in the StreamingAssets folder.
  3. Place the ONNX models in the SentisModels folder.
  4. In the Unity scene, select the Sentis-Whisper GameObject and assign each model as required.

image

Running the Project

Run the game You have two options for input:

Text Input:

Type your message in the input field. Press Enter to send the message.

Voice Input:

Click on the microphone icon to start recording. Speak your message. Click the microphone icon again when you're done speaking to send the message.

The application will process your input (either text or speech) and generate a response using the LLM.

ezgif-2-ef9a9e7acd

Credits and References

This project was developed with the help of the following resources:

  1. Voice Recognition: The implementation is partially based on the tutorial by Thomas Simonini: Building AI-Driven Voice Recognition

  2. LLM Inference: The inferencing part of this project is based on a sample provided by LLMUnity: LLMUnity ChatBot Sample

I recommend checking out these resources for more in-depth understanding of the underlying technologies and implementations.