Speech-to-Text in golang. This is an early development version.
cmd
contains an OpenAI-API compatible servicepkg
contains thewhisper
service and clientsys
contains thewhisper
bindings to thewhisper.cpp
librarythird_party
is a submodule for the whisper.cpp source
You can either run the whisper service as a CLI command or in a docker container. There are docker images for arm64 and amd64 (Intel). The arm64 image is built for Jetson GPU support specifically, but it will also run on Raspberry Pi's.
In order to utilize a NVIDIA GPU, you'll need to install the NVIDIA Container Toolkit first.
A docker volume should be created called "whisper" can be used for storing the Whisper language models. You can see which models are available to download locally here.
The following command will run the server on port 8080 for an NVIDIA GPU:
docker run \
--name whisper-server --rm \
--runtime nvidia --gpus all \ # When using a NVIDIA GPU
-v whisper:/data -p 8080:80 \
ghcr.io/mutablelogic/go-whisper:latest
The API is then
available at http://localhost:8080/v1
and it generally conforms to the
OpenAI API spec.
In order to download a model, you can use the following command (for example):
curl -X POST -H "Content-Type: application/json" -d '{"Path" : "ggml-medium-q5_0.bin" }' localhost:8080/v1/models\?stream=true
To list the models available, you can use the following command:
curl -X GET localhost:8080/v1/models
To delete a model, you can use the following command:
curl -X DELETE localhost:8080/v1/models/ggml-medium-q5_0
To transcribe a media file into it's original language, you can use the following command:
curl -F model=ggml-medium-q5_0 -F file=@samples/jfk.wav localhost:8080/v1/audio/transcriptions\?stream=true
To translate a media file into a different language, you can use the following command:
curl -F model=ggml-medium-q5_0 -F file=@samples/de-podcast.wav -F language=en localhost:8080/v1/audio/translations\?stream=true
There's more information on the API here.
If you are building a docker image, you just need make and docker installed:
DOCKER_REGISTRY=docker.io/user make docker
- builds a docker container with the server binary, tagged to a specific registry
If you want to build the server yourself for your specific combination of hardware (for example,
on MacOS), you can use the Makefile
in the root directory and have the following dependencies
met:
- Go 1.22
- C++ compiler
- FFmpeg 6.1 libraries (see here for more information)
- For CUDA, you'll need the CUDA toolkit installed including the
nvcc
compiler
The following Makefile
targets can be used:
make server
- creates the server binary, and places it in thebuild
directory. Should link to Metal on macOSGGML_CUDA=1 make server
- creates the server binary linked to CUDA, and places it in thebuild
directory. Should work for amd64 and arm64 (Jetson) platforms
See all the other targets in the Makefile
for more information.
TODO
Still in development. See this issue for remaining tasks to be completed.
This module is currently in development and subject to change.
Please do file feature requests and bugs here. The license is Apache 2 so feel free to redistribute. Redistributions in either source code or binary form must reproduce the copyright notice, and please link back to this repository for more information:
go-whisper
https://github.com/mutablelogic/go-whisper/
Copyright (c) 2023-2024 David Thorpe, All rights reserved.whisper.cpp
https://github.com/ggerganov/whisper.cpp
Copyright (c) 2023-2024 The ggml authors
This software links to static libraries of whisper.cpp licensed under the MIT License.