Giter VIP home page Giter VIP logo

Comments (1)

devlux76 avatar devlux76 commented on May 18, 2024 2

I believe I've narrowed down the problem.

In https://github.com/AIGC-Audio/AudioGPT/blob/main/audio-chatgpt.py you have the following code

class T2S:
    def __init__(self, device= None):
        from inference.svs.ds_e2e import DiffSingerE2EInfer
        if device is None:
            device = 'cuda' if torch.cuda.is_available() else 'cpu'
        print("Initializing DiffSinger to %s" % device)
        self.device = device
        self.exp_name = 'checkpoints/0831_opencpop_ds1000'
        self.config= 'NeuralSeq/egs/egs_bases/svs/midi/e2e/opencpop/ds1000.yaml'
        self.set_model_hparams()
        self.pipe = DiffSingerE2EInfer(self.hp, device)
        self.default_inp = {
            'text': '你 说 你 不 SP 懂 为 何 在 这 时 牵 手 AP',
            'notes': 'D#4/Eb4 | D#4/Eb4 | D#4/Eb4 | D#4/Eb4 | rest | D#4/Eb4 | D4 | D4 | D4 | D#4/Eb4 | F4 | D#4/Eb4 | D4 | rest',
            'notes_duration': '0.113740 | 0.329060 | 0.287950 | 0.133480 | 0.150900 | 0.484730 | 0.242010 | 0.180820 | 0.343570 | 0.152050 | 0.266720 | 0.280310 | 0.633300 | 0.444590'
        }

    def set_model_hparams(self):
        set_hparams(config=self.config, exp_name=self.exp_name, print_hparams=False)
        self.hp = hp

    def inference(self, inputs):
        self.set_model_hparams()
        val = inputs.split(",")
        key = ['text', 'notes', 'notes_duration']
        try:
            inp = {k: v for k, v in zip(key, val)}
            wav = self.pipe.infer_once(inp)
        except:
            print('Error occurs. Generate default audio sample.\n')
            inp = self.default_inp
            wav = self.pipe.infer_once(inp)
        #if inputs == '' or len(val) < len(key):
        #    inp = self.default_inp
        #else:
        #    inp = {k:v for k,v in zip(key,val)}
        #wav = self.pipe.infer_once(inp)
        wav *= 32767
        audio_filename = os.path.join('audio', str(uuid.uuid4())[0:8] + ".wav")
        wavfile.write(audio_filename, self.hp['audio_sample_rate'], wav.astype(np.int16))
        print(f"Processed T2S.run, audio_filename: {audio_filename}")
        return audio_filename

class t2s_VISinger:
    def __init__(self, device=None):
        from espnet2.bin.svs_inference import SingingGenerate
        if device is None:
            device = 'cuda' if torch.cuda.is_available() else 'cpu'
        print("Initializing VISingere to %s" % device)
        tag = 'AQuarterMile/opencpop_visinger1'
        self.model = SingingGenerate.from_pretrained(
            model_tag=str_or_none(tag),
            device=device,
        )
        phn_dur = [[0.        , 0.219     ],
            [0.219     , 0.50599998],
            [0.50599998, 0.71399999],
            [0.71399999, 1.097     ],
            [1.097     , 1.28799999],
            [1.28799999, 1.98300004],
            [1.98300004, 7.10500002],
            [7.10500002, 7.60400009]]
        phn = ['sh', 'i', 'q', 'v', 'n', 'i', 'SP', 'AP']
        score = [[0, 0.50625, 'sh_i', 58, 'sh_i'], [0.50625, 1.09728, 'q_v', 56, 'q_v'], [1.09728, 1.9832100000000001, 'n_i', 53, 'n_i'], [1.9832100000000001, 7.105360000000001, 'SP', 0, 'SP'], [7.105360000000001, 7.604390000000001, 'AP', 0, 'AP']]
        tempo = 70
        tmp = {}
        tmp["label"] = phn_dur, phn
        tmp["score"] = tempo, score
        self.default_inp = tmp

    def inference(self, inputs):
        val = inputs.split(",")
        key = ['text', 'notes', 'notes_duration']
        try: # TODO: input will be update
            inp = {k: v for k, v in zip(key, val)}
            wav = self.model(text=inp)["wav"]
        except:
            print('Error occurs. Generate default audio sample.\n')
            inp = self.default_inp
            wav = self.model(text=inp)["wav"]

        audio_filename = os.path.join('audio', str(uuid.uuid4())[0:8] + ".wav")
        soundfile.write(audio_filename, wav, samplerate=self.model.fs)
        return audio_filename

It looked a bit off to me, but I couldn't quite put my finger on it so I asked ChatGPT 4 about it.

It looks like the code is expecting inputs in a specific format and if the inputs are not in that format, it defaults to generating a default audio sample. The key point is in the try/except blocks in both inference methods of the T2S and t2s_VISinger classes. If any exception is thrown during the execution of the code within the try block, it immediately jumps to the except block, which generates a default audio sample.

Here is what's happening in more detail:

  1. The input is expected to be a string with comma-separated values. This string is split by commas to create a list, val.

  2. It then tries to create a dictionary inp by zipping key (which is a list of keys ['text', 'notes', 'notes_duration']) and val (the list of values obtained from the input string).

  3. It passes this inp dictionary to the infer_once function of the pipe object in the T2S class or to the model object in the t2s_VISinger class.

  4. If anything goes wrong during this process (for example, if the input string does not contain enough comma-separated values to match up with the keys, or if the infer_once function does not work with the provided inputs), an exception is raised.

  5. As soon as an exception is raised, it jumps to the except block and generates a default audio sample using self.default_inp.

So, if you're always hearing the same output regardless of the input parameters, it's likely that an exception is being raised every time you try to pass in different parameters. The exception could be due to invalid input format or values, or due to some issue with the infer_once method.

To fix this, you need to ensure that you're passing in inputs in the correct format and with valid values. If you're not sure what the correct format or values are, you'll need to look at the documentation or source code for the infer_once method.

Also, you might want to modify the except block to print out the exception message, like this:

except Exception as e:
    print('Error occurs:', str(e))
    print('Generate default audio sample.\n')
    inp = self.default_inp
    wav = self.pipe.infer_once(inp)

This will give you more information about what's going wrong when you pass in different parameters.

from audiogpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.