Skip to content

Commit

Permalink
init multiverse
Browse files Browse the repository at this point in the history
  • Loading branch information
SpeechSynthesis committed Oct 4, 2024
1 parent 265e7df commit 079c9be
Show file tree
Hide file tree
Showing 658 changed files with 5,074 additions and 0 deletions.
12 changes: 12 additions & 0 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,18 @@ <h1>
</h1>
<h2>Publications</h2>

<article>
<header>
<span class="paper_date"><b>(2024)</b></span><span class="paper_title">&nbsp;MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech, Accepted by EMNLP 2024</span>
<ul>
<!-- <li><a href="paper_link">paper</a></li> -->
<li><a href="publications/Multiverse/index.html">Demo page</a></li>
</ul>
</header>
</article>

<hr class='solid'>

<article>
<header>
<span class="paper_date"><b>(2023)</b></span><span class="paper_title">&nbsp;Synthe-Sees: Face based Text-to-Speech for Virtual Speaker, Accepted by ICASSP 2024</span>
Expand Down
Binary file added publications/multiverse/33_plot_v5.pdf
Binary file not shown.
Binary file added publications/multiverse/33_plot_v6.pdf
Binary file not shown.
1,541 changes: 1,541 additions & 0 deletions publications/multiverse/index.html

Large diffs are not rendered by default.

21 changes: 21 additions & 0 deletions publications/multiverse/index_backup.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
<!-- Abstract -->
<body>
<div class="text" style="font-size:50px">
MultiVerse: Disentangled Modeling for Multi-Task Text-to-Speech
</div>
<div class="text" style="font-size:20px">
Anonymous Authors
</div>
<div class="text" style="font-size:10px">
&nbsp;
</div>
<div class="text" style="font-size:40px">
Abstract
</div>
<div class="text" style="font-size:15px">
This paper introduces "MultiVerse," a novel deep-learning model for multi-task speech synthesis. Addressing the challenges of generalization, we employ a disentangled modeling for speech that separates content, style, and prosody. To this end, our multi-task TTS model utilizes a source-filter model for feature disentanglement and prompt-based auto-regressive prosody modeling. The proposed model excels in zero-shot synthesis, cross-lingual synthesis, and style transfer. Additionally, it handles these tasks with a unified approach, consolidating them within a single, versatile framework. Leveraging disentangled modeling, MultiVerse achieves robust generalization, requiring relatively less training data compared to data-driven approaches. Experimental results demonstrate its remarkable zero-shot synthesis, even in cross-lingual scenarios, producing enhanced speech intelligibility, speaker similarity, and prosody similarity.
</div>
<p style="text-align: center;">
<img src="./ditto.png" alt="Overview" width="800">
</p>
</body>
Binary file added publications/multiverse/static/.DS_Store
Binary file not shown.
1 change: 1 addition & 0 deletions publications/multiverse/static/css/bulma-carousel.min.css

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions publications/multiverse/static/css/bulma-slider.min.css

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions publications/multiverse/static/css/bulma.css.map.txt

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions publications/multiverse/static/css/bulma.min.css

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions publications/multiverse/static/css/fontawesome.all.min.css

Large diffs are not rendered by default.

Loading

0 comments on commit 079c9be

Please sign in to comment.