Comments (1)
I believe I've narrowed down the problem.
In https://github.com/AIGC-Audio/AudioGPT/blob/main/audio-chatgpt.py you have the following code
class T2S:
def __init__(self, device= None):
from inference.svs.ds_e2e import DiffSingerE2EInfer
if device is None:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Initializing DiffSinger to %s" % device)
self.device = device
self.exp_name = 'checkpoints/0831_opencpop_ds1000'
self.config= 'NeuralSeq/egs/egs_bases/svs/midi/e2e/opencpop/ds1000.yaml'
self.set_model_hparams()
self.pipe = DiffSingerE2EInfer(self.hp, device)
self.default_inp = {
'text': '你 说 你 不 SP 懂 为 何 在 这 时 牵 手 AP',
'notes': 'D#4/Eb4 | D#4/Eb4 | D#4/Eb4 | D#4/Eb4 | rest | D#4/Eb4 | D4 | D4 | D4 | D#4/Eb4 | F4 | D#4/Eb4 | D4 | rest',
'notes_duration': '0.113740 | 0.329060 | 0.287950 | 0.133480 | 0.150900 | 0.484730 | 0.242010 | 0.180820 | 0.343570 | 0.152050 | 0.266720 | 0.280310 | 0.633300 | 0.444590'
}
def set_model_hparams(self):
set_hparams(config=self.config, exp_name=self.exp_name, print_hparams=False)
self.hp = hp
def inference(self, inputs):
self.set_model_hparams()
val = inputs.split(",")
key = ['text', 'notes', 'notes_duration']
try:
inp = {k: v for k, v in zip(key, val)}
wav = self.pipe.infer_once(inp)
except:
print('Error occurs. Generate default audio sample.\n')
inp = self.default_inp
wav = self.pipe.infer_once(inp)
#if inputs == '' or len(val) < len(key):
# inp = self.default_inp
#else:
# inp = {k:v for k,v in zip(key,val)}
#wav = self.pipe.infer_once(inp)
wav *= 32767
audio_filename = os.path.join('audio', str(uuid.uuid4())[0:8] + ".wav")
wavfile.write(audio_filename, self.hp['audio_sample_rate'], wav.astype(np.int16))
print(f"Processed T2S.run, audio_filename: {audio_filename}")
return audio_filename
class t2s_VISinger:
def __init__(self, device=None):
from espnet2.bin.svs_inference import SingingGenerate
if device is None:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Initializing VISingere to %s" % device)
tag = 'AQuarterMile/opencpop_visinger1'
self.model = SingingGenerate.from_pretrained(
model_tag=str_or_none(tag),
device=device,
)
phn_dur = [[0. , 0.219 ],
[0.219 , 0.50599998],
[0.50599998, 0.71399999],
[0.71399999, 1.097 ],
[1.097 , 1.28799999],
[1.28799999, 1.98300004],
[1.98300004, 7.10500002],
[7.10500002, 7.60400009]]
phn = ['sh', 'i', 'q', 'v', 'n', 'i', 'SP', 'AP']
score = [[0, 0.50625, 'sh_i', 58, 'sh_i'], [0.50625, 1.09728, 'q_v', 56, 'q_v'], [1.09728, 1.9832100000000001, 'n_i', 53, 'n_i'], [1.9832100000000001, 7.105360000000001, 'SP', 0, 'SP'], [7.105360000000001, 7.604390000000001, 'AP', 0, 'AP']]
tempo = 70
tmp = {}
tmp["label"] = phn_dur, phn
tmp["score"] = tempo, score
self.default_inp = tmp
def inference(self, inputs):
val = inputs.split(",")
key = ['text', 'notes', 'notes_duration']
try: # TODO: input will be update
inp = {k: v for k, v in zip(key, val)}
wav = self.model(text=inp)["wav"]
except:
print('Error occurs. Generate default audio sample.\n')
inp = self.default_inp
wav = self.model(text=inp)["wav"]
audio_filename = os.path.join('audio', str(uuid.uuid4())[0:8] + ".wav")
soundfile.write(audio_filename, wav, samplerate=self.model.fs)
return audio_filename
It looked a bit off to me, but I couldn't quite put my finger on it so I asked ChatGPT 4 about it.
It looks like the code is expecting inputs in a specific format and if the inputs are not in that format, it defaults to generating a default audio sample. The key point is in the try
/except
blocks in both inference
methods of the T2S
and t2s_VISinger
classes. If any exception is thrown during the execution of the code within the try
block, it immediately jumps to the except
block, which generates a default audio sample.
Here is what's happening in more detail:
-
The input is expected to be a string with comma-separated values. This string is split by commas to create a list,
val
. -
It then tries to create a dictionary
inp
by zippingkey
(which is a list of keys ['text', 'notes', 'notes_duration']) andval
(the list of values obtained from the input string). -
It passes this
inp
dictionary to theinfer_once
function of thepipe
object in theT2S
class or to themodel
object in thet2s_VISinger
class. -
If anything goes wrong during this process (for example, if the input string does not contain enough comma-separated values to match up with the keys, or if the
infer_once
function does not work with the provided inputs), an exception is raised. -
As soon as an exception is raised, it jumps to the
except
block and generates a default audio sample usingself.default_inp
.
So, if you're always hearing the same output regardless of the input parameters, it's likely that an exception is being raised every time you try to pass in different parameters. The exception could be due to invalid input format or values, or due to some issue with the infer_once
method.
To fix this, you need to ensure that you're passing in inputs in the correct format and with valid values. If you're not sure what the correct format or values are, you'll need to look at the documentation or source code for the infer_once
method.
Also, you might want to modify the except
block to print out the exception message, like this:
except Exception as e:
print('Error occurs:', str(e))
print('Generate default audio sample.\n')
inp = self.default_inp
wav = self.pipe.infer_once(inp)
This will give you more information about what's going wrong when you pass in different parameters.
from audiogpt.
Related Issues (20)
- Where is the portaspeech module? HOT 1
- how to run this project without GPU,I need help! HOT 1
- No module named 'ldm'
- KeyError: 'binary_data_dir' HOT 3
- downloading cmudict on Mac
- Portaspeech removed but reference still in main python file
- ImportError: cannot import name dataclass_transform HOT 1
- Error - requirements.txt - opencv-contrib-python==4.3.0.36 not available HOT 2
- best practise to run project in windows machine HOT 3
- What is a Model assignment in code?
- Issue running the app HOT 4
- talking head status HOT 1
- question about Image-to-Audio
- Completely local version?
- Failed building wheel
- Dependencies in `requirements.txt` have module conflicts.
- Cann't open huggingface page HOT 2
- Mac with lacks Nvidia graphics capabilities : AssertionError: Torch not compiled with CUDA enabled
- CUDA kernel erros HOT 2
- Can run in cpu???? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from audiogpt.