Roadmap / encodec.cpp / ggml
Inference of SunoAI's bark model in pure C/C++.
With bark.cpp
, our goal is to bring real-time realistic multilingual text-to-speech generation to the community.
- Plain C/C++ implementation without dependencies
- AVX, AVX2 and AVX512 for x86 architectures
- CPU and GPU compatible backends
- Mixed F16 / F32 precision
- 4-bit, 5-bit and 8-bit integer quantization
- Metal and CUDA backends
Models supported
Models we want to implement! Please open a PR :)
Demo on Google Colab (#95)
Here is a typical run using bark.cpp
:
./main -p "This is an audio generated by bark.cpp"
__ __
/ /_ ____ ______/ /__ _________ ____
/ __ \/ __ `/ ___/ //_/ / ___/ __ \/ __ \
/ /_/ / /_/ / / / ,< _ / /__/ /_/ / /_/ /
/_.___/\__,_/_/ /_/|_| (_) \___/ .___/ .___/
/_/ /_/
bark_tokenize_input: prompt: 'This is an audio generated by bark.cpp'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20795 20172 20199 33733 58966 20203 28169 20222
Generating semantic tokens: 17%
bark_print_statistics: sample time = 10.98 ms / 138 tokens
bark_print_statistics: predict time = 614.96 ms / 4.46 ms per token
bark_print_statistics: total time = 633.54 ms
Generating coarse tokens: 100%
bark_print_statistics: sample time = 3.75 ms / 410 tokens
bark_print_statistics: predict time = 3263.17 ms / 7.96 ms per token
bark_print_statistics: total time = 3274.00 ms
Generating fine tokens: 100%
bark_print_statistics: sample time = 38.82 ms / 6144 tokens
bark_print_statistics: predict time = 4729.86 ms / 0.77 ms per token
bark_print_statistics: total time = 4772.92 ms
write_wav_on_disk: Number of frames written = 65600.
main: load time = 324.14 ms
main: eval time = 8806.57 ms
main: total time = 9131.68 ms
Here is a video of Bark running on the iPhone:
ouput.mp4
Here are the steps to use Bark.cpp
git clone --recursive https://github.com/PABannier/bark.cpp.git
cd bark.cpp
git submodule update --init --recursive
In order to build bark.cpp you must use CMake
:
mkdir build
cd build
# To enable nvidia gpu, use the following option
# cmake -DGGML_CUBLAS=ON ..
cmake ..
cmake --build . --config Release
# Install Python dependencies
python3 -m pip install -r requirements.txt
# Download the Bark checkpoints and vocabulary
python3 download_weights.py --out-dir ./models --models bark-small bark
# Convert the model to ggml format
python3 convert.py --dir-model ./models/bark-small --use-f16
# run the inference
./build/examples/main/main -m ./models/bark-small/ggml_weights.bin -p "this is an audio generated by bark.cpp" -t 4
Weights can be quantized using the following strategy: q4_0
, q4_1
, q5_0
, q5_1
, q8_0
.
Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models.
./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0
- Bark
- Encodec
- GPT-3
bark.cpp
is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be
- bug report: you may encounter a bug while using
bark.cpp
. Don't hesitate to report it on the issue section. - feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions.
- pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you.
- Avoid adding third-party dependencies, extra files, extra headers, etc.
- Always consider cross-compatibility with other operating systems and architectures