You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to do some inferencing on this pretrained ConvTasNet single source enhancement model on hugging face and I'm getting notably poor output.
I tried passing an ~18.5 sec, 16kHz clean speech clip mixed with -40dB white Gaussian noise and the output seemed to have about the SNR and the scaling ballooned well passed +/-1 (max sample value around 1500). Additionally, the speech itself sounds slightly distorted.
I should note that I also tried passing just the clean speech to the model and got similar results, as far as added distortion goes.
I'm trying to figure out if I've configured everything correctly to inference using LambdaOverlapAdd. I mostly used the Process large audio files notebook as reference. Here's my code.
Where noisy_audio is the 1-D noisy speech signal, and window_size and hop_size were inferred from the config provided on the hugging face page for the model.
Is there something I'm missing or doing wrong here?
The text was updated successfully, but these errors were encountered:
I'm trying to do some inferencing on this pretrained ConvTasNet single source enhancement model on hugging face and I'm getting notably poor output.
I tried passing an ~18.5 sec, 16kHz clean speech clip mixed with -40dB white Gaussian noise and the output seemed to have about the SNR and the scaling ballooned well passed +/-1 (max sample value around 1500). Additionally, the speech itself sounds slightly distorted.
I should note that I also tried passing just the clean speech to the model and got similar results, as far as added distortion goes.
I'm trying to figure out if I've configured everything correctly to inference using
LambdaOverlapAdd
. I mostly used the Process large audio files notebook as reference. Here's my code.Where
noisy_audio
is the 1-D noisy speech signal, andwindow_size
andhop_size
were inferred from the config provided on the hugging face page for the model.Is there something I'm missing or doing wrong here?
The text was updated successfully, but these errors were encountered: