nano config <div class="snippet-clipboard-content notranslate position-relative

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

can you upload some samples generated from 3 sec prompt ? </blockquot

FYI, there are prebuilt k2 CPU-only wheels available: <a href="https://k2-fsa.org/nigh

nano Training has been able to achieve good results about vall-e HOT 12 CLOSED

lifeiteng commented on June 1, 2024

nano Training has been able to achieve good results

from vall-e.

Comments (12)

rishikksh20 commented on June 1, 2024 4

can you upload some samples generated from 3 sec prompt ?

from vall-e.

lifeiteng commented on June 1, 2024 1

what's the expected output? I had some issues relating to some tensors being on cuda, some on cpu and also getting dependencies (especially k2) installed was not straight forward. Anyway, I forced inference on cpu and the result of this command is: 2023-02-07 19:18:18,936 INFO [infer.py:141] synthesize text: To get up and running quickly just follow the steps below. EOS [61 -> 69]

The resulting wav file is about 5-10kb, so not even a second.

nano config is too small, so the AR-Decoder may not work well.
re-run to get new(diverse) result.

from vall-e.

do-web commented on June 1, 2024

How do i store the training data and how are they loaded? Any example?

from vall-e.

lifeiteng commented on June 1, 2024

How do i store the training data and how are they loaded? Any example?

https://github.com/lifeiteng/valle/blob/main/README.md#training

from vall-e.

lifeiteng commented on June 1, 2024

model trained with nano config(about 100x smaller than the paper config) have been able to synthesize human-like speech.

from vall-e.

lifeiteng commented on June 1, 2024

from vall-e.

lifeiteng commented on June 1, 2024

can you upload some samples generated from 3 sec prompt ?

cd egs/libritts

python3 bin/infer.py \
    --decoder-dim 128 --nhead 4 --num-decoder-layers 4 --model-name valle \
    --text-prompts "Go to her." \
    --audio-prompts ./prompts/61_70970_000007_000001.wav \
    --output-dir infer/demo_valle_epoch20 \
    --checkpoint exp/valle_nano_v2/epoch-20.pt

from vall-e.

tfriedel commented on June 1, 2024

what's the expected output? I had some issues relating to some tensors being on cuda, some on cpu and also getting dependencies (especially k2) installed was not straight forward. Anyway, I forced inference on cpu and the result of this command is:
2023-02-07 19:18:18,936 INFO [infer.py:141] synthesize text: To get up and running quickly just follow the steps below.
EOS [61 -> 69]

The resulting wav file is about 5-10kb, so not even a second.

from vall-e.

entn-at commented on June 1, 2024

This was the output when I ran the command (converted to opus/webm, Github doesn't accept wav):
valle_nano_0.webm

from vall-e.

entn-at commented on June 1, 2024

FYI, there are prebuilt k2 CPU-only wheels available: https://k2-fsa.org/nightly/index.html

from vall-e.

tfriedel commented on June 1, 2024

@lifeiteng that was it! Had to try a couple of times and sometimes the results were longer and resembled human speech, even if they didn't resemble the input prompt.

from vall-e.

lifeiteng commented on June 1, 2024

#22

from vall-e.

nano Training has been able to achieve good results about vall-e HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent