Megatts2_HierSpeechpp

This project need download some model resoucres and prepare train datasets...
The train pipeline needs to be linked together by yourself, and code not clean...
To avoid some risks, some code implementations changed and not have been carefully checked, may have problems...

Acoustic model use megatts2
Vocoder use HierSpeechpp: https://github.com/sh-lee-prml/HierSpeechpp
Features use facebook's wav2vec2 model extract wav2vec
For train process: (text, f0, spk_mel, wav2vec) -> megatts2 ->> (wav2vec, f0) -> HierSpeechpp ->> wav
For inference process: (text, spk_mel) -> megatts2 ->> (wav2vec, f0) -> HierSpeechpp ->> wav

Model resource

Download HierSpeechpp's hierspeechpp_eng_kor, hierspeechpp_libritts960 and ttv_libritts_v1 to this dir.
Download facebook/wav2vec2-xls-r-300m.

Dataset prepare

Features
- text: use pinyin's phoneme tone(English use CMU).
- f0: extract_f0.py
- mel: extract_mel.py
- wav2vec: extract_w2v.py
- duration: phoneme's align duration(use mfa to extract).
Features filelists
- text features files
  - configs/config.json need train_list.txt,in the train_list.txt, maybe your need zhvoice/zhmagicdata/5_2431/trans/transcription.txt.styletts.train such files.
- audio features files
  - in the audio dir, you need also have mel wav2vec f0 dur files.
  - demo.wav
  - demo.hw2v.pt
  - demo.hf0.npy
  - demo.hmel.npy
  - demo.dur.npy

Train

1. s2_stage: use train_ms.py train megatts(rvq).
1. s1_stage: use train_ms_s1.py train plm, config.json train_stage param set "s1_1"; s1_stage's exp_dir as s2_stage's to load RVQ related model checkpoint.

# train s2_stage
# for conv stride 8: in data_utils, dur mel w2v use 8 times
CUDA_VISIBLE_DEVICES="0" python train_ms.py -c configs/config.json -m exp

# train s1_stage: config.json train_stage param set "s1_1".
# train plm GPT, not use GPT-SoVITS's AR modules model;
# and to avoid some risks, we use github_megatts2's GPT model, code implementation is not carefully checked...
CUDA_VISIBLE_DEVICES="0" python train_ms_s1.py -c configs/config.json -m exp

Inference

The provide model checkpoint Models, use zhvoice, LibriTTS(100,360,500), VCTK, aishell3 and 200h_chinese(generated from the TTS interface...).

Download the provided checkpoint to 'models' dir, or change the checkpoint path to your owns.

python inference_plm.py

More

For HierSpeechpp's vocoder is heavy and not open source training code.
You can use hiftnet as vocoder, it also need f0 to train the model. You can see https://github.com/yl4579/HiFTNet
Also, if you want to wav/audio super-resolution 16/24 kHz to 48 kHz, go to https://github.com/liuhuang31/HiFTNet-sr

important: retrain HiFTNet or HiFTNet-sr, its feature need change to wav2vec.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
AR		AR
alias_free_torch		alias_free_torch
configs		configs
denoiser		denoiser
example		example
filelists		filelists
plm		plm
prepare_datasets		prepare_datasets
results		results
speechsr24k		speechsr24k
speechsr48k		speechsr48k
text		text
ttv_v1		ttv_v1
zhvoice/zhmagicdata/5_2431/trans		zhvoice/zhmagicdata/5_2431/trans
3-get-semantic-s.py		3-get-semantic-s.py
LICENSE		LICENSE
Mels_preprocess.py		Mels_preprocess.py
README.md		README.md
activations.py		activations.py
attentions.py		attentions.py
commons.py		commons.py
data_utils.py		data_utils.py
extract_f0.py		extract_f0.py
extract_mel.py		extract_mel.py
extract_revise_styletts.py		extract_revise_styletts.py
extract_w2v.py		extract_w2v.py
gen_mel_16k.py		gen_mel_16k.py
hierspeechpp_speechsynthesizer.py		hierspeechpp_speechsynthesizer.py
inference.py		inference.py
inference.sh		inference.sh
inference_plm.py		inference_plm.py
inference_speechsr.py		inference_speechsr.py
inference_vc.py		inference_vc.py
inference_vc.sh		inference_vc.sh
losses.py		losses.py
mel_processing.py		mel_processing.py
modules.py		modules.py
requirements.txt		requirements.txt
requirements_mac.txt		requirements_mac.txt
s1_train.py		s1_train.py
styleencoder.py		styleencoder.py
train_ms.py		train_ms.py
train_ms_s1.py		train_ms_s1.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Megatts2_HierSpeechpp

Model resource

Dataset prepare

Train

Inference

More

About

Releases

Packages

Languages

License

liuhuang31/Megatts2_HierSpeechpp

Folders and files

Latest commit

History

Repository files navigation

Megatts2_HierSpeechpp

Model resource

Dataset prepare

Train

Inference

More

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages