This is the fork from:
https://huggingface.co/spaces/aadnk/whisper-webui (Code and Demo)
https://gitlab.com/aadnk/whisper-webui/-/blob/main/README.md Readme (Gitlab)\
Note: Original Commits are from GitLab repo https://gitlab.com/aadnk/whisper-webui/-/commits/main
Found on openai/whisper#397
Using this OpenAI Whisper fork for low VRAM memory to use a large image on 8GB GRPU.
whisper-for-low-vram
This is tested with Docker and work fine with 8GB GPU and large whisper image.
Work very fine with non English languages (tested Serbian).
If you want to use the latest Whisper use the original repo.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
To run this program locally, first install Python 3.9+ and Git. Then install Pytorch 10.1+ and all the other dependencies:
pip install -r requirements.txt
You can find detailed instructions for how to install this on Windows 10/11 here (PDF).
Finally, run the full version (no audio length restrictions) of the app with parallel CPU/GPU enabled:
python app.py --input_audio_max_duration -1 --server_name 127.0.0.1 --auto_parallel True
You can also run the CLI interface, which is similar to Whisper's own CLI but also supports the following additional arguments:
python cli.py \
[--vad {none,silero-vad,silero-vad-skip-gaps,silero-vad-expand-into-gaps,periodic-vad}] \
[--vad_merge_window VAD_MERGE_WINDOW] \
[--vad_max_merge_size VAD_MAX_MERGE_SIZE] \
[--vad_padding VAD_PADDING] \
[--vad_prompt_window VAD_PROMPT_WINDOW]
[--vad_cpu_cores NUMBER_OF_CORES]
[--vad_parallel_devices COMMA_DELIMITED_DEVICES]
[--auto_parallel BOOLEAN]
In addition, you may also use URL's in addition to file paths as input.
python cli.py --model large --vad silero-vad --language Japanese "https://www.youtube.com/watch?v=4cICErqqRSM"
Rather than supplying arguments to app.py
or cli.py
, you can also use the configuration file config.json5. See that file for more information.
If you want to use a different configuration file, you can use the WHISPER_WEBUI_CONFIG
environment variable to specify the path to another file.
You can also run this Web UI directly on Google Colab, if you haven't got a GPU powerful enough to run the larger models.
See the colab documentation for more information.
You can also run both the Web-UI or the CLI on multiple GPUs in parallel, using the vad_parallel_devices
option. This takes a comma-delimited list of
device IDs (0, 1, etc.) that Whisper should be distributed to and run on concurrently:
python cli.py --model large --vad silero-vad --language Japanese \
--vad_parallel_devices 0,1 "https://www.youtube.com/watch?v=4cICErqqRSM"
Note that this requires a VAD to function properly, otherwise only the first GPU will be used. Though you could use period-vad
to avoid taking the hit
of running Silero-Vad, at a slight cost to accuracy.
This is achieved by creating N child processes (where N is the number of selected devices), where Whisper is run concurrently. In app.py
, you can also
set the vad_process_timeout
option. This configures the number of seconds until a process is killed due to inactivity, freeing RAM and video memory.
The default value is 30 minutes.
python app.py --input_audio_max_duration -1 --vad_parallel_devices 0,1 --vad_process_timeout 3600
To execute the Silero VAD itself in parallel, use the vad_cpu_cores
option:
python app.py --input_audio_max_duration -1 --vad_parallel_devices 0,1 --vad_process_timeout 3600 --vad_cpu_cores 4
You may also use vad_process_timeout
with a single device (--vad_parallel_devices 0
), if you prefer to always free video memory after a period of time.
You can also set auto_parallel
to True
. This will set vad_parallel_devices
to use all the GPU devices on the system, and vad_cpu_cores
to be equal to the number of
cores (up to 8):
python app.py --input_audio_max_duration -1 --auto_parallel True
You can upload multiple files either through the "Upload files" option, or as a playlist on YouTube. Each audio file will then be processed in turn, and the resulting SRT/VTT/Transcript will be made available in the "Download" section. When more than one file is processed, the UI will also generate a "All_Output" zip file containing all the text output files.
To run it in Docker, first install Docker and optionally the NVIDIA Container Toolkit in order to use the GPU. Then either use the GitLab hosted container below (for the latest Whisper), or check out this repository and build an image with low VRAM GPU support:
docker build -t whisper-webui:1 .
You can then start the WebUI with GPU support like so:
docker run -d --gpus=all -p 7860:7860 whisper-webui:1
Leave out "--gpus=all" if you don't have access to a GPU with enough memory, and are fine with running it on the CPU only:
docker run -d -p 7860:7860 whisper-webui:1
This Docker container is builded with the latest Whisper (Do not use it if you want to run low VRAM version)
This Docker container is also hosted on GitLab:
docker run -d --gpus=all -p 7860:7860 registry.gitlab.com/aadnk/whisper-webui:latest
You can also pass custom arguments to app.py
in the Docker container, for instance to be able to use all the GPUs in parallel:
docker run -d --gpus all -p 7860:7860 \
--mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
--restart=on-failure:15 rwhisper-webui:1 \ app.py --input_audio_max_duration -1 --server_name 0.0.0.0 --auto_parallel True \
--default_vad silero-vad --default_model_name large
You can also call cli.py
the same way:
docker run --gpus all \
--mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
--mount type=bind,source=${PWD},target=/app/data \
registry.gitlab.com/aadnk/whisper-webui:latest \
cli.py --model large --auto_parallel True --vad silero-vad \
--output_dir /app/data /app/data/YOUR-FILE-HERE.mp4
Note that the models themselves are currently not included in the Docker images, and will be downloaded on the demand. To avoid this, bind the directory /root/.cache/whisper to some directory on the host (for instance /home/administrator/.cache/whisper), where you can (optionally) prepopulate the directory with the different Whisper models.
docker run -d --gpus=all -p 7860:7860 \
--mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
whisper-webui:1
App with webui is published with Appache 2.0
Whistler-for-low-VRAM is published with MIT