Hello, thanks for sharing this amazing repo! Could we have more information how to pro

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Need more info for training and inference about mellotron HOT 6 CLOSED

nvidia commented on August 23, 2024

Need more info for training and inference

from mellotron.

Comments (6)

AndroYD84 commented on August 23, 2024 1

I pulled from master and the same musicxml that wouldn't want to play before (I switched only the word "hallelujah" with "systematic") is working now!
About the training, you're absolutely correct, there was a problem in my filelist, turns out that when I generated the text from audio using STT, I didn't consider the possibility of parts with silence only for more than 4 seconds, so there were two lines that had missing information, now I could start the training without problems (so far).
I managed to generate audio from a custom musicxml too, but it required plenty of trial and error, as it wouldn't want to work despite it was apparently flawless, turns out there was a single note between the bunch without lyrics applied to it, after I removed that single note it finally worked, the converter doesn't explicitly point out which part of the data or which word is throwing an error, making it difficult to narrow down all the possibile causes (could be really anything, a typo, a note, a special character, a single lyricless note in a sea of notes, something that shouldn't be there, etc.), it happens that some musicxmls will throw errors and I can't figure out at all what is possibly wrong with them even after checking on them carefully, some problems can be elusive at best.

from mellotron.

rafaelvalle commented on August 23, 2024

@AndroYD84 I'll address the first issue here.
Please create another issue for the training problem such that that we can address it there.

The musicxml parser we provided is a basic starting point for parsing musicxml files. The requirements are :

The characters must be in [a-zA-Z]
Each word must start with an upper case
Every word must exist in the arpabet dictionary.

You're likely violating [3.] by changing letters of the word Hallelujah.

from mellotron.

AndroYD84 commented on August 23, 2024

Thanks for the quick reply! I made a copy of "haendel_hallelujah.musicxml" and changed only the word "hal·le·lu·jah" with "sys·tem·at·ic" (this word appears in the arpabet dictionary and on the "cmu_dictionary" file from this repo) and I'm getting this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-19-ca64c0de9770> in <module>
----> 1 data = get_data_from_musicxml('data/haendel_systematic.musicxml', 149, convert_stress=True)
      2 panning = {'Soprano': [-60, -30], 'Alto': [-40, -10], 'Tenor': [30, 60], 'Bass': [10, 40]}

C:\mellotron\mellotron_utils.py in get_data_from_musicxml(filepath, bpm, phoneme_durations, convert_stress)
    475         f0s = event2f0(events_arpabet)
    476         alignment, f0s = remove_excess_frames(alignment, f0s)
--> 477         text_encoded, text_clean = event2text(events_arpabet, convert_stress)
    478 
    479         # convert data to torch

C:\mellotron\mellotron_utils.py in event2text(events, convert_stress, cmudict)
    438         text_clean = re.sub('[0-9]', '1', text_clean)
    439 
--> 440     text_encoded = text_to_sequence(text_clean, [], cmudict)
    441     return text_encoded, text_clean
    442 

C:\mellotron\text\__init__.py in text_to_sequence(text, cleaner_names, dictionary)
     44       clean_text = _clean_text(text, cleaner_names)
     45       if cmudict is not None:
---> 46         clean_text = [get_arpabet(w, dictionary) for w in clean_text.split(" ")]
     47         for i in range(len(clean_text)):
     48             t = clean_text[i]

C:\mellotron\text\__init__.py in <listcomp>(.0)
     44       clean_text = _clean_text(text, cleaner_names)
     45       if cmudict is not None:
---> 46         clean_text = [get_arpabet(w, dictionary) for w in clean_text.split(" ")]
     47         for i in range(len(clean_text)):
     48             t = clean_text[i]

C:\mellotron\text\__init__.py in get_arpabet(word, dictionary)
     14 
     15 def get_arpabet(word, dictionary):
---> 16   word_arpabet = dictionary.lookup(word)
     17   if word_arpabet is not None:
     18     return "{" + word_arpabet[0] + "}"

AttributeError: 'NoneType' object has no attribute 'lookup'

Then I redid the same procedure but changing the word "sys·tem·at·ic" with "hal·le·lu·jah" and it works again. I'm really confused now.
For reference, here's the files I used:
haendel_hallelujah.musicxml
haendel_systematic.musicxml
I compared them side by side and don't see anything out of place.

from mellotron.

rafaelvalle commented on August 23, 2024

Pull from master and try again.
The musicxml converter is a simple prototype to get people started and we our community will improve it.

For your AttributeError while training, it is likely that you have a line without a speaker id.

from mellotron.

rafaelvalle commented on August 23, 2024

Great that you were able to get it working from a custom musicxml too.
Please add a pull request if you make improvements to the musicxml parser.

from mellotron.

camjac251 commented on August 23, 2024

Are you still able to run this on Windows with Anaconda? I've been trying to get it to work with Windows 10 and have been facing many issues. I finally got it to work but with iterations happening every 8 seconds. Have you tried to use tensorflow-gpu instead of the cpu tensorflow? Does that help speed?

from mellotron.

Need more info for training and inference about mellotron HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent