whisper.m

Automatic speech recognition in MATLAB/Octave based on the excellent whisper.cpp from Georgi Gerganov and models from OpenAI's Whisper.

Installation

First, clone the repository with submodules:

git clone --recurse-submodules https://github.com/gllmflndn/whisper.m.git

MATLAB

Then compile the MEX file using make in a Terminal:

make

The Accelerate and Metal frameworks will be used on macOS. On Windows, use MSYS2 and MinGW-w64, see MATLAB Support.

GNU Octave

If compiling for Octave, execute the following instead from a Terminal:

make MEXBIN="mkoctfile --mex" MEXEXT=mex MEXOPT=""

Usage

To run whisper.m on a pre-recorded audio file (mono, 16kHz) called input.wav:

w = whisper('small');
[segments,tokens] = w.transcribe('input.wav',...
                                 'print_realtime', true,...
                                 'print_progress', false);
whisper.display_tokens(tokens);

Pre-trained models will be downloaded automatically from Hugging Face when needed and stored in a models directory. Model options are tiny, tiny.en, base, base.en, small, small.en, medium, medium.en and large.

Another example to record audio data and run whisper.m:

Fs = 16000;
nbits = 16;
nchannels = 1;
id = 1; % see audiodevinfo to select the audio device
rec = audiorecorder(Fs, nbits, nchannels, id);

recDuration = 10;
disp('Begin speaking.')
recordblocking(rec, recDuration);
disp('End of recording.')
y = getaudiodata(rec);

w = whisper('small');
[segments,tokens] = w.transcribe(y','print_progress', false);
whisper.display_tokens(tokens);

To extrac the audio track from a video at 16kHz mono, you can use ffmpeg:

ffmpeg -i video.mp4 -f wav -ar 16000 -ac 1 -vn  audio.wav

There is also a demo that uses an audio file shipped with whisper.cpp:

>> whisper.demo()
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   73.62 MB
whisper_model_load: model size    =   73.54 MB
whisper_init_state: kv self size  =    2.62 MB
whisper_init_state: kv cross size =    8.79 MB
whisper_init_state: compute buffer (conv)   =   11.17 MB
whisper_init_state: compute buffer (encode) =   61.76 MB
whisper_init_state: compute buffer (cross)  =    3.67 MB
whisper_init_state: compute buffer (decode) =   18.82 MB

And so my fellow Americans ask not what your country can do for you ask what you can do for your country

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

whisper.m

Installation

MATLAB

GNU Octave

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

whisper.m

Installation

MATLAB

GNU Octave

Usage