VAD: Voice Activity Detector

The VAD receives 8kHz, 16-bit, mono, signed-linear PCM data over a TCP socket, running it through the Silero VAD ML model, and returns for each processed block a one-byte unsigned integer (00-FF) for the confidence that voice is present.

Generally, we seem to be able to use fairly high confidences as a threshold: >97% (> 0xF7).

Usage note: the model is intended as an aggregate, so whenever the client side determines that it has sufficient data to indicate a voice, it should terminate the connection, dialing again when it wants to detect voice again, even if that is during the same telephone conversation.

Testing

Samples may be tested with ffmpeg and nc, like so:

$ ffmpeg -hide_banner -loglevel error -re -i test.wav -f s16le - | nc localhost 3030 |hexdump

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
vad		vad
.dockerignore		.dockerignore
.envrc		.envrc
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VAD: Voice Activity Detector

Testing

About

Releases

Packages

Languages

License

CyCoreSystems/vad

Folders and files

Latest commit

History

Repository files navigation

VAD: Voice Activity Detector

Testing

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages