mandrain results sharing and training support #210
Replies: 18 comments
-
In styletts1: In styletts2: |
Beta Was this translation helpful? Give feedback.
-
这是用aishell3训练的吗?中文的合成韵律感觉很差啊,基本没停顿。 |
Beta Was this translation helpful? Give feedback.
-
@zhouyong64 aishell is bad in general, it is just like VCTK, no emotions and flat prosodies. |
Beta Was this translation helpful? Give feedback.
-
Hi liuhuang, thanks for your great sharing, and I have a question about how did you generate 48kHz audio in ref_gen.zip, because I find when I generate 48kHz audio, the audio sounds very high-pitched, but in your generation, it looks great in 48kHz, thanks for your help in advance :P |
Beta Was this translation helpful? Give feedback.
-
@blldd Hi, ref_gen.zip file is generated by styletts1 model, which is a acoustic model to generate 24k mel. And then use a super-resolution hifigan vocoder convert 24k_mel to 48k wav. As styletts1 and vocoder, their mel extract params is same. |
Beta Was this translation helpful? Give feedback.
-
Great! Thanks for your help! I am also curious about the multi-language capability, cause I tried the StyleTTS2 trained on LbriTTS, and I find the model cannot apply to French text, cause the generated audio is spoken in English pronunciation. |
Beta Was this translation helpful? Give feedback.
-
@blldd Hi, blldd. First i retrain the asr model use Chinese phoneme. Second for no chinese pl-bert exists, i remove the pl-bert module. And then use chinese data to train styletts2_removed_pl-bert_retrain_ASR model. |
Beta Was this translation helpful? Give feedback.
-
Which SLM model did you use for Chinese? I guess it's not microsoft/wavlm-base-plus. |
Beta Was this translation helpful? Give feedback.
-
@zhouyong64 Hi, for now, I am still using pure English microsoft/wavlm-base-plus. Changing to another one may require some changes to the model structure, so it remains unchanged. |
Beta Was this translation helpful? Give feedback.
-
hi, if you remove the pl_module, did you replace it with the text encoder on the second training stage? |
Beta Was this translation helpful? Give feedback.
-
@mayfool hi, yes, i simply replace it with the text_encoder. |
Beta Was this translation helpful? Give feedback.
-
Thanks for reply. Here're a few questions: 1. Did you use the text encoder pretrained from the 1st stage, or just the new text encoder without pretrain? 2. Will such modification affect the zero-shot ability? |
Beta Was this translation helpful? Give feedback.
-
@mayfool hi,
|
Beta Was this translation helpful? Give feedback.
-
@liuhuang31 Thanks a lot! |
Beta Was this translation helpful? Give feedback.
-
@mayfool I use chinese_hubert_large model. |
Beta Was this translation helpful? Give feedback.
-
@liuhuang31 ,你好,请问一下在训练asr的时候又出现负ctc loss吗,我我在训练时出现了负ctc loss 已经负loss。请问你是如何处理数据的呢? |
Beta Was this translation helpful? Give feedback.
-
@yijingshihenxiule 你好, |
Beta Was this translation helpful? Give feedback.
-
@liuhuang31 ,Thank you for your reply. I will have a try. |
Beta Was this translation helpful? Give feedback.
-
Share your Chinese synthesis results or mandrain model training questions.
Beta Was this translation helpful? Give feedback.
All reactions