Finetuning transphone g2p #13

pragvr · 2023-06-29T10:24:45Z

Thank you for sharing your work!
I was wondering if it's possible to finetune the transphone G2P model with proprietary lexicons. If yes, could you please share some instructions on how to achieve this?

xinjli · 2023-06-29T16:07:15Z

hi, thanks for your question!

It currently does not support train/fine-tuning with your own lexicon, but it should not be very difficult to modify the code to achieve this. To do this,
You can first implement your own dataset in transphone/model/dataset.py
and then switch loading your dataset at transphone/bin/train_g2p.py and use it to train with your own data

Then it should be working I think

pragvr · 2023-07-05T12:57:46Z

Hi again,
I incorporated the changes you suggested and got the training from scratch part working. However, I still have trouble with the finetuning part mainly because the src and tgt token embedding sizes are different for pre-trained model vs. the data that I have (mostly stress markers for english that are not there in existing data). Would you have any suggestions on how to train this?

xinjli · 2023-07-05T14:09:23Z

Yes, you can do it in two steps:

there are vocab files in the pretrained model's directory. You can first append your new stress symbols to the end of the vocab.tgt file
then you need to modify the model loader to initialize the existing embedding with the pretrained embedding, but the new appended vocab embedding with random initialization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning transphone g2p #13

Finetuning transphone g2p #13

pragvr commented Jun 29, 2023

xinjli commented Jun 29, 2023

pragvr commented Jul 5, 2023

xinjli commented Jul 5, 2023

Finetuning transphone g2p #13

Finetuning transphone g2p #13

Comments

pragvr commented Jun 29, 2023

xinjli commented Jun 29, 2023

pragvr commented Jul 5, 2023

xinjli commented Jul 5, 2023