Giter VIP home page Giter VIP logo

Comments (6)

willstott101 avatar willstott101 commented on August 12, 2024 1

I don't know what your use-case is @artrayd but we've had success combining the current audio volume with allosaurus output for generating animated lipsync. It patches over any sounds allosaurus does not recognise, and helps us respond to silence correctly.

However, if you use-case does not have clean enough audio for that, we have seen that allosaurus is remarkably good at just not outputting anything for periods with non-speech sounds. So simply finding gaps of a particular length in the allosaurus output may well be suitable.

from allosaurus.

xinjli avatar xinjli commented on August 12, 2024

this is an issue caused by the loss function used in the model (CTC), unfortunately, it is known to have this peaky issue and cannot be fixed.

from allosaurus.

artrayd avatar artrayd commented on August 12, 2024

Hi @xinjli thank you for your answer. Another question, is that theoretically possible to mark pauses? A time when there is no voice at all?

from allosaurus.

artrayd avatar artrayd commented on August 12, 2024

@willstott101 thank you! I was thinking in the same direction.

from allosaurus.

62mkv avatar 62mkv commented on August 12, 2024

I suspect that due to this issue, allosaurus "swallows" multiple phones when the speech is rather quick (in Estonian, for example, native speakers tend to produce sounds quickly because the words are so long). Might this be the case? If so, what is an CTC model, where do I learn more about it?

from allosaurus.

62mkv avatar 62mkv commented on August 12, 2024

(I suspect that -e option might be intended to compensate for that "fixed duration" thingy.. unfortunately it does not seem to do any better for the overall outcome)

from allosaurus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.