Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Gigispeech problem in Kaldi #1620

Open
YangangCao opened this issue Aug 14, 2024 · 3 comments
Open

Training Gigispeech problem in Kaldi #1620

YangangCao opened this issue Aug 14, 2024 · 3 comments

Comments

@YangangCao
Copy link

YangangCao commented Aug 14, 2024

Hi dear author,

I only want to train a small acoustics model use Gigaspeech, but I encountered some problems when I run Gigaspeech recipe in Kaldi.

.if [ $stage -le 2 ]; then
echo "======Train lm START | current time : date +%Y-%m-%d-%T=============="
mkdir -p $lm_dir || exit 1;
sed 's|\t| |' data/$train_combined/text |
cut -d " " -f 2- > $lm_dir/corpus.txt || exit 1;
echo "break point1"
local/lm/train_lm.sh
--cmd "$train_cmd" --lm-order $lm_order
$lm_dir/corpus.txt $lm_dir || exit 1;
echo "break point2"
echo "======Train lm END | current time : date +%Y-%m-%d-%T================"
fi

this step let me install SRILM and train a language model(when I train librispeech, I didn't do these two things), is it necessary?(I only want to train a acoustics model and don't need compute wer), whatever, I skip this step

Thanks very much!

@nshmyrev
Copy link
Collaborator

You can skip this step.

Still, it is recommended to install SRILM and evaluate the model, it is an important part of accuracy testing.

Next, you probably want to take some modern model instead of gigaspeech, there are many of them and they depend on your requirements. They gonna be much more accurate.

@YangangCao
Copy link
Author

Hi dear author, thanks for your reply, my goal is to train a text limited ASR model, I only know chain model support it, any other more accurate method?

@nshmyrev
Copy link
Collaborator

Modern RNNT / conformer CTC model should be more accurate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants