mandrain results sharing and training support #210

liuhuang31 · 2023-12-08T07:43:15Z

liuhuang31
Dec 8, 2023

Share your Chinese synthesis results or mandrain model training questions.

liuhuang31 · 2023-12-08T07:57:06Z

liuhuang31
Dec 8, 2023
Author

In styletts1:
ref audios as belows: ref.zip
@GuangChen2016 hi brother, i use your provided ref_audio to generate styletts results. If it makes you uncomfortable, I will take it down immediately.
ref text is: 杭州亚运会即将在9月开幕，这是继北京冬奥会之后，我国再次承办的一项国际大型体育赛事。然而，在这场盛会上，我们将看不到来自俄罗斯和白俄罗斯的运动员的身影。他们被国际奥委会以“技术原因”为由拒之门外，无缘参加杭州亚运会。这一决定引起了我国的不满和反对。我国一直主张欢迎符合条件的俄罗斯和白俄罗斯运动员参加杭州亚运会，而不是对他们进行歧视和限制。我国认为，运动员是否参赛应该由他们自己的体育表现决定，而不是其他因素，包括战争等。我国还表示，愿意为他们搭建一个良好的参赛平台，让他们以中立身份参赛，并且不会影响奖牌的分配。
ref generate results is: ref_gen.zip

In styletts2:
ref audios as styletts1.
ref text is:
00000001 Stay cool he added with a smile.
00000002 你从楼梯跑上了二楼的走廊，深邃，阴暗，空气中散发着沉闷的味道。你拄着膝盖大口的喘着粗气。
00000003 一股深入毛孔的恐惧感，围绕着你，似乎有什么可怕的东西，正在从楼梯向上爬。幽暗的走廊里，唯一的光源，是一个昏暗的灯泡，就在你前方的不远处。
ref generate results is: ref_styletts2_gen.zip

0 replies

zhouyong64 · 2023-12-09T13:26:38Z

zhouyong64
Dec 9, 2023

In styletts1: ref audios as belows: ref.zip @GuangChen2016 hi brother, i use your provided ref_audio to generate styletts results. If it makes you uncomfortable, I will take it down immediately. ref text is: 杭州亚运会即将在9月开幕，这是继北京冬奥会之后，我国再次承办的一项国际大型体育赛事。然而，在这场盛会上，我们将看不到来自俄罗斯和白俄罗斯的运动员的身影。他们被国际奥委会以“技术原因”为由拒之门外，无缘参加杭州亚运会。这一决定引起了我国的不满和反对。我国一直主张欢迎符合条件的俄罗斯和白俄罗斯运动员参加杭州亚运会，而不是对他们进行歧视和限制。我国认为，运动员是否参赛应该由他们自己的体育表现决定，而不是其他因素，包括战争等。我国还表示，愿意为他们搭建一个良好的参赛平台，让他们以中立身份参赛，并且不会影响奖牌的分配。 ref generate results is: ref_gen.zip

In styletts2: ref audios as styletts1. ref text is: 00000001 Stay cool he added with a smile. 00000002 你从楼梯跑上了二楼的走廊，深邃，阴暗，空气中散发着沉闷的味道。你拄着膝盖大口的喘着粗气。 00000003 一股深入毛孔的恐惧感，围绕着你，似乎有什么可怕的东西，正在从楼梯向上爬。幽暗的走廊里，唯一的光源，是一个昏暗的灯泡，就在你前方的不远处。 ref generate results is: ref_styletts2_gen.zip

这是用aishell3训练的吗？中文的合成韵律感觉很差啊，基本没停顿。

0 replies

yl4579 · 2023-12-09T16:44:30Z

yl4579
Dec 9, 2023
Maintainer

@zhouyong64 aishell is bad in general, it is just like VCTK, no emotions and flat prosodies.

0 replies

blldd · 2023-12-10T09:32:07Z

blldd
Dec 10, 2023

In styletts1: ref audios as belows: ref.zip @GuangChen2016 hi brother, i use your provided ref_audio to generate styletts results. If it makes you uncomfortable, I will take it down immediately. ref text is: 杭州亚运会即将在9月开幕，这是继北京冬奥会之后，我国再次承办的一项国际大型体育赛事。然而，在这场盛会上，我们将看不到来自俄罗斯和白俄罗斯的运动员的身影。他们被国际奥委会以“技术原因”为由拒之门外，无缘参加杭州亚运会。这一决定引起了我国的不满和反对。我国一直主张欢迎符合条件的俄罗斯和白俄罗斯运动员参加杭州亚运会，而不是对他们进行歧视和限制。我国认为，运动员是否参赛应该由他们自己的体育表现决定，而不是其他因素，包括战争等。我国还表示，愿意为他们搭建一个良好的参赛平台，让他们以中立身份参赛，并且不会影响奖牌的分配。 ref generate results is: ref_gen.zip

In styletts2: ref audios as styletts1. ref text is: 00000001 Stay cool he added with a smile. 00000002 你从楼梯跑上了二楼的走廊，深邃，阴暗，空气中散发着沉闷的味道。你拄着膝盖大口的喘着粗气。 00000003 一股深入毛孔的恐惧感，围绕着你，似乎有什么可怕的东西，正在从楼梯向上爬。幽暗的走廊里，唯一的光源，是一个昏暗的灯泡，就在你前方的不远处。 ref generate results is: ref_styletts2_gen.zip

Hi liuhuang, thanks for your great sharing, and I have a question about how did you generate 48kHz audio in ref_gen.zip, because I find when I generate 48kHz audio, the audio sounds very high-pitched, but in your generation, it looks great in 48kHz, thanks for your help in advance :P

0 replies

liuhuang31 · 2023-12-11T02:50:46Z

liuhuang31
Dec 11, 2023
Author

@blldd Hi, ref_gen.zip file is generated by styletts1 model, which is a acoustic model to generate 24k mel. And then use a super-resolution hifigan vocoder convert 24k_mel to 48k wav. As styletts1 and vocoder, their mel extract params is same.

0 replies

blldd · 2023-12-11T03:22:03Z

blldd
Dec 11, 2023

@blldd Hi, ref_gen.zip file is generated by styletts1 model, which is a acoustic model to generate 24k mel. And then use a super-resolution hifigan vocoder convert 24k_mel to 48k wav. As styletts1 and vocoder, their mel extract params is same.

Great! Thanks for your help! I am also curious about the multi-language capability, cause I tried the StyleTTS2 trained on LbriTTS, and I find the model cannot apply to French text, cause the generated audio is spoken in English pronunciation.
So how do you get the model to speak Chinese well？

0 replies

liuhuang31 · 2023-12-11T03:49:51Z

liuhuang31
Dec 11, 2023
Author

@blldd Hi, blldd. First i retrain the asr model use Chinese phoneme. Second for no chinese pl-bert exists, i remove the pl-bert module. And then use chinese data to train styletts2_removed_pl-bert_retrain_ASR model.

0 replies

zhouyong64 · 2023-12-14T10:46:51Z

zhouyong64
Dec 14, 2023

@blldd Hi, blldd. First i retrain the asr model use Chinese phoneme. Second for no chinese pl-bert exists, i remove the pl-bert module. And then use chinese data to train styletts2_removed_pl-bert_retrain_ASR model.

Which SLM model did you use for Chinese? I guess it's not microsoft/wavlm-base-plus.

0 replies

liuhuang31 · 2023-12-14T10:53:34Z

liuhuang31
Dec 14, 2023
Author

@zhouyong64 Hi, for now, I am still using pure English microsoft/wavlm-base-plus. Changing to another one may require some changes to the model structure, so it remains unchanged.

0 replies

mayfool · 2023-12-16T07:26:27Z

mayfool
Dec 16, 2023

hi, if you remove the pl_module, did you replace it with the text encoder on the second training stage?

0 replies

liuhuang31 · 2023-12-16T07:32:23Z

liuhuang31
Dec 16, 2023
Author

@mayfool hi, yes, i simply replace it with the text_encoder.

0 replies

mayfool · 2023-12-16T08:04:36Z

mayfool
Dec 16, 2023

@mayfool hi, yes, i simply replace it with the text_encoder.

Thanks for reply. Here're a few questions: 1. Did you use the text encoder pretrained from the 1st stage, or just the new text encoder without pretrain？ 2. Will such modification affect the zero-shot ability?

0 replies

liuhuang31 · 2023-12-16T08:28:44Z

liuhuang31
Dec 16, 2023
Author

@mayfool hi,

use the text encoder pretrained from the 1st stage. In origin, pl_bert_output -> diffusion, for remove pl_bert, we changed to text_encoder_output -> diffusion.
In my view, pl_bert relates to text, pl_bert may help with prosody or naturalness. For zero-shot ability, remve it won't have much impact.

0 replies

mayfool · 2023-12-16T08:33:50Z

mayfool
Dec 16, 2023

@liuhuang31 Thanks a lot！

0 replies

Moonmore · 2023-12-18T12:41:33Z

Moonmore
Dec 18, 2023

@blldd Hi, blldd. First i retrain the asr model use Chinese phoneme. Second for no chinese pl-bert exists, i remove the pl-bert module. And then use chinese data to train styletts2_removed_pl-bert_retrain_ASR model.

Which SLM model did you use for Chinese? I guess it's not microsoft/wavlm-base-plus.

@mayfool I use chinese_hubert_large model.

0 replies

yijingshihenxiule · 2024-05-06T05:42:13Z

yijingshihenxiule
May 6, 2024

@liuhuang31 ,你好，请问一下在训练asr的时候又出现负ctc loss吗，我我在训练时出现了负ctc loss 已经负loss。请问你是如何处理数据的呢？
我的数据处理，比如“今天天气不错。” -->j in1 t ian1 t ian1 q i4 b u2 c uo2 .,并且包含了空格和标点。我也尝试过去掉tone，“今天天气不错。” -->j in t ian t ian q i b u c uo .但是同样会出现负 loss，我不知道是什么原因，请问你有过这方面的经验吗？

0 replies

liuhuang31 · 2024-05-06T06:13:04Z

liuhuang31
May 6, 2024
Author

@yijingshihenxiule 你好，
(1) 我忘记了我的tensorboard里 ctc loss是不是也出现负数了，但我印象中记得asr模型或者styletts模型好像有负值的loss，但不知道是不是ctc了。
(2) 个人建议是，如果你没改asr代码，只是改了一些phoneme的输入，先不用理会，继续进行下一步。

0 replies

yijingshihenxiule · 2024-05-07T04:33:31Z

yijingshihenxiule
May 7, 2024

@liuhuang31 ,Thank you for your reply. I will have a try.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mandrain results sharing and training support #210

{{title}}

Replies: 18 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

mandrain results sharing and training support #210

Replies: 18 comments

liuhuang31 Dec 8, 2023 Author

yl4579 Dec 9, 2023 Maintainer

liuhuang31 Dec 11, 2023 Author

liuhuang31 Dec 11, 2023 Author

liuhuang31 Dec 14, 2023 Author

liuhuang31 Dec 16, 2023 Author

liuhuang31 Dec 16, 2023 Author

liuhuang31 May 6, 2024 Author

liuhuang31
Dec 8, 2023
Author

yl4579
Dec 9, 2023
Maintainer

liuhuang31
Dec 11, 2023
Author

liuhuang31
Dec 11, 2023
Author

liuhuang31
Dec 14, 2023
Author

liuhuang31
Dec 16, 2023
Author

liuhuang31
Dec 16, 2023
Author

liuhuang31
May 6, 2024
Author