Giter VIP home page Giter VIP logo

steveway / papagayo-ng Goto Github PK

View Code? Open in Web Editor NEW

This project forked from morevnaproject-org/papagayo-ng

19.0 19.0 3.0 12.58 MB

Papagayo is a lip-syncing program designed to help you line up phonemes (mouth shapes) with the actual recorded sound of actors speaking. Papagayo makes it easy to lip sync animated characters by making the process very simple - just type in the words being spoken (or copy/paste them from the animation's script), then drag the words on top of the sound's waveform until they line up with the proper sounds.

Home Page: http://steveway.github.io/papagayo-ng/

Python 96.86% HTML 2.05% Shell 0.68% NSIS 0.34% QMake 0.07%

papagayo-ng's People

Contributors

andeon avatar blackwarthog avatar dr-ecker avatar evgenijkatunov avatar jensdreske avatar luzpaz avatar mark-collins-voxsio avatar morevnaproject avatar ogallagher avatar pictureelements avatar steveway avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

papagayo-ng's Issues

Allosaurus requiring double download

This may be specific to my system configuration, so possibly disregard if no one else reports the same issue. When you first install Papagayo-NG then press the install button for Allosaurus, it downloads and tells you to restart. When you restart, it flashes a message that says Allosaurus is not installed. So, you download again, and on this second restart, it works.

Thank you

p.s. I just wanted to say again, this is Super Cool!

Missing conversion error

Missing conversion for: ɻ Traceback (most recent call last): File "O:\papagayo-ng-allosaurus\LipsyncFrameQT.py", line 432, in on_open self.open(file_path) File "O:\papagayo-ng-allosaurus\LipsyncFrameQT.py", line 468, in open self.doc.auto_recognize_phoneme() File "O:\papagayo-ng-allosaurus\LipsyncDoc.py", line 850, in auto_recognize_phoneme word.text = "".join(letter["phoneme"] for letter in word_chunk) TypeError: sequence item 0: expected str instance, NoneType found

I was finally able to give this a try. It looks very cool! I am not sure it is working as expected though. I was only able to get 1 sound file to work with it, and the auto breakdown was nothing near what i expected. It seemed the breakdown did not remotely match the spoken words. The sound file i was able to get recognized was of low quality though. That may be the issue.
Thanks!

Installer error

Windows server 2016 64bit
[Papagayo-NG 1.6.6]

=== Verbose logging started: 29.03.2022 11:03:50 Build type: SHIP UNICODE 5.00.10011.00 Calling process: C:\Windows\system32\msiexec.exe ===
MSI (c) (E8:68) [11:03:50:365]: Font created. Charset: Req=204, Ret=204, Font: Req=MS Shell Dlg, Ret=MS Shell Dlg

MSI (c) (E8:68) [11:03:50:369]: Font created. Charset: Req=204, Ret=204, Font: Req=MS Shell Dlg, Ret=MS Shell Dlg

MSI (c) (E8:7C) [11:03:50:426]: MSI_DBG: Invalid descriptor format - unable to decode ProductCode in descriptor
MSI (c) (E8:70) [11:03:50:626]: Resetting cached policy values

MSI (c) (E8:70) [11:03:50:626]: Machine policy value 'Debug' is 0
MSI (c) (E8:70) [11:03:50:626]: ******* RunEngine:
******* Product: C:\Users*\Desktop\papa\papagayo-ng_installer.exe
******* Action:
******* CommandLine: **********
MSI (c) (E8:70) [11:03:50:630]: Note: 1: 2203 2: C:\Users*
\Desktop\papa\papagayo-ng_installer.exe 3: -2147286960
MSI (c) (E8:70) [11:03:50:631]: MainEngineThread is returning 1620
=== Verbose logging stopped: 29.03.2022 11:03:50 ===

Unable to run in conda env

Hello again. I just tried to build this latest iteration, and ran into some trouble. this may of course all be user error in the end.

I installed a conda env with python 3.9
I then ran pip install -r requirements.txt
at the end of the install there, i got this message

Building wheels for collected packages: webrtcvad
Building wheel for webrtcvad (setup.py) ... error
error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [9 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-39
      copying webrtcvad.py -> build\lib.win-amd64-cpython-39
      running build_ext
      building '_webrtcvad' extension
      error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for webrtcvad
  Running setup.py clean for webrtcvad
Failed to build webrtcvad
ERROR: Could not build wheels for webrtcvad, which is required to install pyproject.toml-based projects

to solve this, i used conda install -c conda-forge webrtcvad but then, when i run papagayo-ng.py, i get a message flashing about ffmpeg not available, and then the env terminal closes. so i then install ffmpeg into the env with conda install ffmpeg
I try to run python papagayo-ng.py again, and the terminal just closes. so, i do not know what to do at this point, or have any clue as to where to start looking.

Thank you for continuing to advance this project!

Crash on usage of Allosaurus model eng2102

Upon initiation of Allosaurus voice recognition model "eng2102" I get this dialogue pop-up:

Screenshot (750)

Then the program crashes to the desktop.

*I fully redownloaded and reinstalled the newest version with no other settings changed---all download extensions were also reinstalled properly---this test was run after the usual FFmpeg restart.

Does not recognize previous installs location when installing updated version

Hello. Installation of newer a version does not seem to automatically recognize previous installs location when installing updated version. For example, i have 1.6.4 installed in c:\software\papagayo-ng, but when installing this updated version 1.6.5, it automatically wants to put it in C:\Program Files (x86)\Papagayo-NG.

Thank you

Unable to download Rhubarb in-app

Using Papagayo-NG 1.6.6.0

When I click "Download Rhubarb" in the toolbar, I get the following error:

Traceback (most recent call last):
  File "utilities.py", line 196, in run
  File "LipsyncFrameQT.py", line 473, in download_rhubarb
  File "urllib\request.py", line 214, in urlopen
  File "urllib\request.py", line 501, in open
  File "urllib\request.py", line 320, in __init__
  File "urllib\request.py", line 346, in full_url
  File "urllib\request.py", line 375, in _parse
ValueError: unknown url type: ''

Add ability to download and switch between different Allosaurus pretrained models

I believe 1.6.5 uses the uni2005 model right (universal)?

If that is the case, there should be an ability to download and switch to language-specific models, which "should perform much better than the universal model" (Xinjian Li) for the said language.

Model eng2102 apparently does much better for English dialogue than latest/uni2005 according to the Allosaurus GitHub description.

If there is a way/workaround to use eng2102 in the current build (or does it already?) that I am unaware of please let me know.
Thanks for reading.

Request: Please correct a mistake i made. H should be HH

It seems i made a mistake in naming the mouth shapes. I named a file H.png when it should be HH.png. This causes a missing mouth shape warning in Papagayo-NG.

My apologies.
The fix should be as simple as renaming H.png to HH.png in your source code, in both the 66% and the 100% folders. I have also updated my Git to correct the error.

Thank you

Using Transformers and Wav2Vec for automatic generation

So, Machine Learning did get quite a boost over the past few months.
I did some testing and as an alternative to allosaurus we might be able to use wav2vec.
Here is an example I have which seems to be able to get the CMU Phonemes we use with timestamps using a Wav2Vec model from Huggingface Transformers:

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import torch
import soundfile as sf

# load model and processor
processor = Wav2Vec2Processor.from_pretrained("vitouphy/wav2vec2-xls-r-300m-timit-phoneme")
model = Wav2Vec2ForCTC.from_pretrained("vitouphy/wav2vec2-xls-r-300m-timit-phoneme")

# Read and process the input
audio_input, sample_rate = sf.read("./Tutorial Files/lame.wav")
inputs = processor(audio_input, sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():
    logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits

# Decode id into string
predicted_ids = torch.argmax(logits, axis=-1)
predicted_sentences = processor.batch_decode(predicted_ids, output_char_offsets=True)
time_offset = model.config.inputs_to_logits_ratio / 16000
print(predicted_sentences)
ipa_to_cmu = {
    "b": "B",
    "ʧ": "CH",
    "d": "D",
    "ð": "DH",
    "f": "F",
    "g": "G",
    "h": "H",
    "ʤ": "JH",
    "k": "K",
    "l": "L",
    "m": "M",
    "ŋ": "NG",
    "n": "NG",
    "p": "P",
    "r": "R",
    "s": "S",
    "ʃ": "SH",
    "t": "T",
    "θ": "TH",
    "v": "V",
    "w": "W",
    "j": "Y",
    "z": "Z",
    "ʒ": "ZH",
    "ɑ": "AA",
    "æ": "AE",
    "ə": "AH",
    "ʌ": "AH",
    "ɔ": "AO",
    "ɛ": "EH",
    "ɚ": "ER",
    "ɝ": "ER",
    "ɪ": "IH",
    "i": "IY",
    "ʊ": "UH",
    "u": "UW",
    "aʊ": "AW",
    "aɪ": "AY",
    "eɪ": "EY",
    "oʊ": "OW",
    "o": "OW",
    "ɔɪ": "OY",
    "e": "EH",
    "a": "AA",
    "ʔ": "rest",
    "ɒ": "AO",
    "ɯ": "UW",
    "ɹ": "R",
    "ɾ": "R",
    "ɹ̩": "ER",
    "ɻ": "R",
    "-": "rest",
    "ɡ": "G",
    "x": "N",
    "d͡ʒ": "JH",
    "t͡ʃ": "CH"
}
cmu_phones = ""
cmu_list = []
print(predicted_sentences.char_offsets)
for char in predicted_sentences.char_offsets[0]:
    if char["char"] in ipa_to_cmu:
        cmu_phones += ipa_to_cmu[char["char"]] + " "

        cmu_list.append({"char": ipa_to_cmu[char["char"]],
                         "start_time": char["start_offset"] * time_offset,
                         "end_time": char["end_offset"] * time_offset})
    else:
        print("missing")
        print(char["char"])
print(cmu_list)

This would be using this model here:
https://huggingface.co/vitouphy/wav2vec2-xls-r-300m-timit-phoneme
And if we are already using transformers and wav2vec then we could use that at the same time to get human readable text.
For that we could even use OpenAI Whisper, the results are good, but it does not include phonemes or timestamps afaik.
Also this might be good to, if we can extract timestamps too: https://huggingface.co/bookbot/wav2vec2-ljspeech-gruut

Destructive Uninstall, incomplete Uninstall

It seems the uninstall procedure simply deletes the entire installed folder instead of selectively removing the files that were installed. Example, when i added the ffmpeg manually during previous experiments, it added a couple additional files that are not added during the normal install procedure. When i then uninstalled using the uninstaller, it deleted even these files i had manually added.
This time, using the uninstaller, it deleted a file i manually added, but left the Papagayo-NG.exe file.
Also, it does not seem to attempt to remove the added libraries, ffmpeg and the AI model from the user directory.

Thank you

Use Callback System for sounddevice usage

It might make sense to use SoundPlayer.py on all systems since this does not depend on QT for playback.
The only real problem seems to be the volume, while it is playing it can't be changed.
But using the Callbacks it is apparently possible to change the volume during playback.
spatialaudio/python-sounddevice#159
If we get this to work nicely with all platforms we can just use this for everything.

Both Kaspersky and Emsisoft say the EXE is performing suspicious actions characteristic of Malware

From Kaspersky a bit after launch, and a bit after performing Allosaurus (eng2102) lip-sync:

Event: Malicious object detected
Application: Lip-Sync Software
User: *********
User type: Initiator
Component: System Watcher
Result description: Detected
Type: Trojan
Name: PDM:Trojan.Win32.Generic
Threat level: High
Object type: Process
Object path: C:\Program Files (x86)\Papagayo-NG
Object name: papagayo-ng.exe
MD5: 8D6BE0AB06EE7C48F56FCC058BD4F387
Reason: Behavior analysis
Databases release date: Today, 5/15/2023 12:03:00 PM

2nd Opinion From Emsisoft immediately after the console opens (no logo pops up until I click "Wait, I think this is safe" and restart):

5/15/2023 5:21:04 PM
Behavior Blocker detected suspicious behavior "HiddenInstallation" of C:\Program Files (x86)\Papagayo-NG\papagayo-ng.exe (SHA1: 2ED12B349535EBDDD7F0AC4E2F0AFDFAA4554BCA)

5/15/2023 5:21:22 PM
A notification message "Suspicious behavior has been found in the following program: C:\Program Files (x86)\Papagayo-NG\papagayo-ng.exe" has been shown

5/15/2023 5:21:31 PM
User "*********" clicked "Wait, I think this is safe"

Unable to exit Breakdown without forced close

When you press Breakdown to alter a word, you are locked in to the Breakdown window. Even after you go through the whole phrase, or hold cancel, or hold Esc, or keep pressing F4. The only way i have found to exit the Breakdown window is to open Task Manager on windows 10 and force close the whole application.

Thank you

failed to load tcl/tk libraries

cannot seem to get papagayo-ng_installer to run on windows 10 VM.
Message flashes by about "failed to load tcl/tk libraries"

Thank you

Volume slider not registering correctly

When first started, the position of the volume slider is at 50%, but the actual volume level is 100%
Also, the volume slider has has no effect on volume level when sliding while the audio file is playing. It does work if set before playback.

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.