Comments (6)
I changed
hidden_channels
,hidden_channels_enc
andhidden_channels_dec
to 512, but I still encountered the following problem: in the inference time, some words are missing. For example, if I input:
I cannot hearr e4
in the synthesized voice. (The number here stands for tones in mandarin). Could you please give me some advice? Thanks!
Hi, you can try the trick add blank token between any two input tokens. My experiment in Chinese shows that this trick can improve pronunciation significantly.
from glow-tts.
I changed hidden_channels
, hidden_channels_enc
and hidden_channels_dec
to 512, but I still encountered the following problem: in the inference time, some words are missing. For example, if I input:
I cannot hear r e4
in the synthesized voice. (The number here stands for tones in mandarin). Could you please give me some advice? Thanks!
from glow-tts.
I changed
hidden_channels
,hidden_channels_enc
andhidden_channels_dec
to 512, but I still encountered the following problem: in the inference time, some words are missing. For example, if I input:
I cannot hearr e4
in the synthesized voice. (The number here stands for tones in mandarin). Could you please give me some advice? Thanks!Hi, you can try the trick add blank token between any two input tokens. My experiment in Chinese shows that this trick can improve pronunciation significantly.
Hi @shahuzi , I'm very happy that you are also interested in traininng glow-tts by using Mandarin datasets and thank you very much for your suggestion. I will try it!
By the way, because I encountered some alignment problems before (e.g. some words are always missing at the inference time) and I'm not sure whether I gave the model right input sequences. Could you kindly tell me what does your input sequences look like? Are you also using phonemes as I mentioned above? Do you use prosodic labels (e.g. "#1 #3 #4 #5" which stands for the pause in a sentence)?
For example:
“把知识运用于实践,离不开思考” ⬇️
"REC-001.wav|start0 b a3 zh ix1 sh ix5 sp2 vn4 iong4 v2 sh ix2 j ian4 sp3 l i2 b u4 k ai1 s iy1 k ao3 end0"
in which "sp3" means long pause (e.g. comma) and sp2 means short pause (e.g. 换气短停顿).
from glow-tts.
I changed
hidden_channels
,hidden_channels_enc
andhidden_channels_dec
to 512, but I still encountered the following problem: in the inference time, some words are missing. For example, if I input:
I cannot hearr e4
in the synthesized voice. (The number here stands for tones in mandarin). Could you please give me some advice? Thanks!Hi, you can try the trick add blank token between any two input tokens. My experiment in Chinese shows that this trick can improve pronunciation significantly.
Hi @shahuzi , I'm very happy that you are also interested in traininng glow-tts by using Mandarin datasets and thank you very much for your suggestion. I will try it!
By the way, because I encountered some alignment problems before (e.g. some words are always missing at the inference time) and I'm not sure whether I gave the model right input sequences. Could you kindly tell me what does your input sequences look like? Are you also using phonemes as I mentioned above? Do you use prosodic labels (e.g. "#1 #3 #4 #5" which stands for the pause in a sentence)?
For example:
“把知识运用于实践,离不开思考” ⬇️
"REC-001.wav|start0 b a3 zh ix1 sh ix5 sp2 vn4 iong4 v2 sh ix2 j ian4 sp3 l i2 b u4 k ai1 s iy1 k ao3 end0"
in which "sp3" means long pause (e.g. comma) and sp2 means short pause (e.g. 换气短停顿).
@Charlottecuc 不好意思,没有及时回复你。
我英语不是很好,用中文回复吧~
我的输入是音素序列、音调序列和一些韵律表征(#1,#3等),和你的差不多。
其中音素序列和韵律特征并不是和你一样组合成一个序列,而是看成平行特征,三种输入做embedding后,在channel维度上拼接起来。
from glow-tts.
@shahuzi 您好,方便提供几个你用glow-tts合成的音频样例吗?十分感谢
from glow-tts.
@shahuzi 您好,方便提供几个你用glow-tts合成的音频样例吗?十分感谢
由于涉及到数据安全问题,我没法给你提供demo,见谅。目前我的结论是:对于播报式的音库,可以正常地合成,对于表现力很丰富的音库,合成会出问题。
from glow-tts.
Related Issues (20)
- Question about duration loss HOT 1
- Runtime Error: Multi speaker HOT 1
- GPU required or CPU-compatible? HOT 1
- Different Languages us different amount of GPU memory
- multi speaker
- Output compared to Fastspeech2
- Models for finetuning
- Could not create monotonic_align HOT 3
- Glowtts melspectrogram to fine tune hifigan HOT 2
- RuntimeError: CUDA error: invalid device function
- ImportError: /glow-tts/monotonic_align/monotonic_align/core.cpython-38-x86_64-linux-gnu.so: failed to map segment from shared object HOT 1
- Error using mel generated from glow-tts for hifi-gan training HOT 1
- Can I apply MAS method to other model ? HOT 1
- Query : How is the Model training different from the Model training of wave glow
- Multi speaker training error HOT 11
- With out Training DDI
- An explanation for the source code of finding the alignment path in GlowTTS? HOT 2
- DDI training compared to not DDI training HOT 1
- [Question] How many iterations for the available pretrained model?
- [Question] about `intersperse` function. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glow-tts.