Giter VIP home page Giter VIP logo

Comments (4)

demonauthor avatar demonauthor commented on September 25, 2024 1

1 - See issue 237 in the original tortoise repo, there's params you can try (I haven't had time to experiment yet) 2 - I've also noticed that, seems to be a failing of tortoise generally, not aware of any possible fixes

I'd be interested to hear what you've done to get awesome results -- what dataset size did you have, how many epochs, other hyperparamters. I have not yet managed to get awesome results.

I have been looking at the AI-Voice-Cloning setup. There is a setting in there called "pause time" which gets rid of the clipped last word. It seems to be much earlier in the build and is very slow, but coming along nicely. I'm still getting much better results with DLAS and Ozen.

The test I did was trying to clone Vincent Price's voice. I used three different audiobook readings he did. They are fairly clean and his speech is consistent. That yeilded about 500 clips using Ozen to create the dataset. Then I did 200 steps in DLAS and clicked the Auto Settings button. I have a separate set of clips I made for the voices folder that I can interchange to get different types of readings (specific emotions, rasp, voice pitch). Doesn't always work, but most of the time I get great results. Going to try 300 steps and see if the quality is any cleaner.

I've done 5 other tests with similarly recognizable voices (Walken, Jeff Goldblum, Louise from Bob's Burgers...) with equivalent results. The cadence isn't always right, but the tone, pronunciation etc are great. Instantly recognizable. Now I'm trying to combine voices to create specific sounds. Using this to do some preproduction for a film proof of concept, and it's working nicely. Sor of like a digital table read.

from dl-art-school.

demonauthor avatar demonauthor commented on September 25, 2024 1

The best way I have found to fix the clipping is just to add a space and then a single character to the end of the phrase. Then edit that final character out if it is pronounced. As for the doubling of the final line...breaking the text into shorter phrases fixes this. Shorter phrases also yield better "performances" overall.

from dl-art-school.

xenotropic avatar xenotropic commented on September 25, 2024

1 - See issue 237 in the original tortoise repo, there's params you can try (I haven't had time to experiment yet)
2 - I've also noticed that, seems to be a failing of tortoise generally, not aware of any possible fixes

I'd be interested to hear what you've done to get awesome results -- what dataset size did you have, how many epochs, other hyperparamters. I have not yet managed to get awesome results.

from dl-art-school.

xenotropic avatar xenotropic commented on September 25, 2024

I found hyperparameters that worked better for me, see #1 . Mostly reducing lr for smaller datasets / single speaker.

For repeats, I experimented running each of length_penalty and repetition_penalty up to 1024, zero difference (super-helpful to have those exposed as script parameters in this repo).

It is oddly regular in that it always seems to affect an elements in a list, text of the form "blah blah, X, Y, and Z" being rendered as "blah blah, X, Y, and Z, and Z". If anyone has thoughts on what to experiment with to try to eliminate that, open to ideas.

from dl-art-school.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.