Giter VIP home page Giter VIP logo

facebookresearch / audiocraft Goto Github PK

View Code? Open in Web Editor NEW
19.9K 181.0 2.0K 1.96 MB

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

License: MIT License

Makefile 0.16% Python 98.89% CSS 0.16% HTML 0.79%

audiocraft's Introduction

AudioCraft

docs badge linter badge tests badge

AudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen.

Installation

AudioCraft requires Python 3.9, PyTorch 2.1.0. To install AudioCraft, you can run the following:

# Best to make sure you have torch installed first, in particular before installing xformers.
# Don't run this if you already have PyTorch installed.
python -m pip install 'torch==2.1.0'
# You might need the following before trying to install the packages
python -m pip install setuptools wheel
# Then proceed to one of the following
python -m pip install -U audiocraft  # stable release
python -m pip install -U git+https://[email protected]/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge
python -m pip install -e .  # or if you cloned the repo locally (mandatory if you want to train).

We also recommend having ffmpeg installed, either through your system or Anaconda:

sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install "ffmpeg<5" -c conda-forge

Models

At the moment, AudioCraft contains the training code and inference code for:

  • MusicGen: A state-of-the-art controllable text-to-music model.
  • AudioGen: A state-of-the-art text-to-sound model.
  • EnCodec: A state-of-the-art high fidelity neural audio codec.
  • Multi Band Diffusion: An EnCodec compatible decoder using diffusion.
  • MAGNeT: A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.

Training code

AudioCraft contains PyTorch components for deep learning research in audio and training pipelines for the developed models. For a general introduction of AudioCraft design principles and instructions to develop your own training pipeline, refer to the AudioCraft training documentation.

For reproducing existing work and using the developed training pipelines, refer to the instructions for each specific model that provides pointers to configuration, example grids and model/task-specific information and FAQ.

API documentation

We provide some API documentation for AudioCraft.

FAQ

Is the training code available?

Yes! We provide the training code for EnCodec, MusicGen and Multi Band Diffusion.

Where are the models stored?

Hugging Face stored the model in a specific location, which can be overridden by setting the AUDIOCRAFT_CACHE_DIR environment variable for the AudioCraft models. In order to change the cache location of the other Hugging Face models, please check out the Hugging Face Transformers documentation for the cache setup. Finally, if you use a model that relies on Demucs (e.g. musicgen-melody) and want to change the download location for Demucs, refer to the Torch Hub documentation.

License

  • The code in this repository is released under the MIT license as found in the LICENSE file.
  • The models weights in this repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.

Citation

For the general framework of AudioCraft, please cite the following.

@inproceedings{copet2023simple,
    title={Simple and Controllable Music Generation},
    author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Dรฉfossez},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023},
}

When referring to a specific model, please cite as mentioned in the model specific README, e.g ./docs/MUSICGEN.md, ./docs/AUDIOGEN.md, etc.

audiocraft's People

Contributors

0xlws avatar adefossez avatar adiyoss avatar ashleykleynhans avatar bocytko avatar carankt avatar carlthome avatar eltociear avatar escfrya avatar felixkreuk avatar frinkleko avatar grandaddyshmax avatar harushii18 avatar j4bez avatar jadecopet avatar jamesonwilliams avatar jamierpond avatar jonathanfly avatar kmosnu avatar kushaldas avatar lonzi avatar mimbres avatar patrickvonplaten avatar radames avatar sanchit-gandhi avatar srezasm avatar starburst997 avatar sungeuns avatar syhw avatar ylacombe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

audiocraft's Issues

Model currupts over 30 seconds

What is the point of this? Why does this model lose its effectiveness with samples that exceed 30 secs?

def set_generation_params(self, use_sampling: bool = True, top_k: int = 250,
                              top_p: float = 0.0, temperature: float = 1.0,
                              duration: float = 60.0, cfg_coef: float = 3.0,
                              two_step_cfg: bool = False):

example:

0.mp4

Voting

Nice work! Support a method to let us vote for the best banging tracks on the sample page!

[Question] Best configs?

Currently I'm getting okay results, getting a lot of reverb and other audio artifacts, with the below, just looking for some suggestions on what others have found work well.

result = client.predict(
		"large",	# str  in 'Model' Radio component
		"Jingle",	# str  in 'Input Text' Textbox component
                               "",	# str (filepath or URL to file) in 'Melody Condition (optional)' Audio component
		10,	# int | float (numeric value between 1 and 30) in 'Duration' Slider component
		250,	# int | float  in 'Top-k' Number component
		0,	# int | float  in 'Top-p' Number component
		1,	# int | float  in 'Temperature' Number component
		1,	# int | float  in 'Classifier Free Guidance' Number component
		fn_index=0
)

Ran Git Pull https://github.com/facebookresearch/audiocraft.git main -- no won't generate

I went into Anaconda, ran the VENV environment as part of my BAT I created. Still get the Triton error (no biggie). Launch GRADIO. Now when i tell it to run a Large Model 30 sec I get a media player window with 0sec length & errors. This was working fine out-of-the-box before I ran the Git Pull. Any thoughts?

CHROME, windows, Anaconda, NVDIA 3060

Loading model large
CLIPPING C:\Users\dueme\AppData\Local\Temp\tmp79ua18y_.wav happening with proba (a bit of clipping is okay): 0.00013124999532010406 maximum scale: 1.116523027420044
E:\audiocraft\venv\lib\site-packages\matplotlib\axes_axes.py:2229: RuntimeWarning: overflow encountered in scalar add
dx = [convert(x0 + ddx) - x for ddx in dx]
E:\audiocraft\venv\lib\site-packages\matplotlib\axes_axes.py:2229: RuntimeWarning: overflow encountered in scalar subtract
dx = [convert(x0 + ddx) - x for ddx in dx]
E:\audiocraft\venv\lib\site-packages\matplotlib\patches.py:739: RuntimeWarning: overflow encountered in scalar add
y1 = self.convert_yunits(self._y0 + self._height)
'ffmpeg' is not recognized as an internal or external command,
operable program or batch file.

Google Colab output files location

In Google Colab, where are the output files saved? I've looked through /content, /usr/local/lib/python3.10/dist-packages/audiocraft and /tmp with no results.

SUGGESTIONS: Neg prompt, segment append, wave visualizer

THIS IS BY FAR the best one I've come across for local generation yet! Awesome job. Thanks!

I'm not a programmer so I really don't know how hard these are to implement so forgive me for asking for the moon & the stars if that's what these would require but I'll put them out there. Hopefully you'll attract a team that will be able to contribute.

NEGATIVE PROMPT
As everyone is familiar with. This would be very helpful.

WEIGHTED TOKEN PROMPT
Another one people are familiar with

SEGMENT APPENDING/INSERTION
Allowing us to extend a song by generating it twice & having the tool automatically add them together one after the other with a little crossover to reduce volume issue

And/or being able to give a sequential orchestration of the song to be created, maybe delimited by [x] where the prompt would be [intro description] [next segment] [ next segment] [ending] with items outside of the brackets applying to all, so it'd be like "harpsichord music [baroque] [70s psychedelic] [musak] fade out"

WAVE VISUALIZER
If someone really wanted to get fancy they could add handles to select sections for a type of "inpainting" and/or other effect controls like normalization, volume, EQ, etc.

Getting error "Found no NVIDIA driver on your system."

I tried to run the Music generator script given in this huggingface and got this error

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and have installed a driver from http://www.nvidia.com/Download/index.aspx

Is it necessary to have an Nvidia GPU to run the pre-trained models?

weights?

hi, im new to text to music, is the music free to use commercially (if there are no "Weights"?) i couldnt find any info on the huggingface demo of this? thanks.

Continuation is very choppy unlike text to music

Hello!

It's fuzzy.

I tried continuation of a 12 seconds track to get 30 seconds total, using the demo.ipynb notebook ran in colab, I used Large model, and before the cell that plays it I ran the 2 cells which I am not sure why they are there they seem to be about adding bip bip ?? pi math into the center of the song - no clue. Got it to run without those but keeping the import stuffz.

It at first didn't want to run so I had to add in the very first cell '!pip install audiocraft' at top

The whole 30 second output including the song input itself is now clearly choppy. Like low resolution.

Text to music works fine, those are mostly clear. Any idea why?

Audio file conditioning to continue (sliding window)

Consider implementing or providing an option to condition a model on a specific audio file, enabling the generation of audio that continues the input audio.

I read it's possible using a sliding window, but I would like to see example usage code for this in the Jupyter notebook.

Thank you guys so much!

LoRAs possible for fast and lightweight fine-tuning?

Just wondering if it will be feasible? Obviously the base model probably wasn't trained with any copyrighted music and probably won't recognize any band name I throw at it, so it will be very valuable. I could already foresee a huge library of LoRAs much like Stable Diffusion for every band imaginable - Pink Floyd crossed with Aphex Twin, anyone?

USAGE SUGGESTIONS: Set Save File Location; Metadata; Autogen File Name; Save last Gradio Setting

Thanks! It's usable the way it is. These are just user interface improvement suggestions. Can these be added?

SET SAVE FILE LOCATION
Even in the same session in Chrome when I tell it to download the file to save it defaults to my default Windows location, not the last place I saved the previous file in the same session. Can there be a designated Output folder?

METADATA
Is there some place in the WAV file format to save the metadata of the prompt & other settings?

AUTOGEN FILE NAME
The default filename is just "audio" which will be a pain when the save location is set. Using the Prompt as the filename can be problematic because of the characters allowable at the prompt that can't be used as a file name. If some unique filename could be used, based on date & time, that'd be great. The metadata can hold the prompt info.

SAVE LAST GRADIO SETTING
Which model is being used, the file length, etc. That'd be helpful.

FR: Sequencing sections Text Prompt

Can we introduce a prompt mechanic that will allow us to feed a "time sequential" theme/feel that would look/act like this:

general prompt text [Section 1 prompt {x}(a) [section 2 prompt {y} (b)] [section 3 prompt {z}(c)]... return to general prompt

Where the different sections are sequential in the time of the composition. Where something like Bohemian Rhapsody would be

Queen rock song 85 bpm [a cappella, harmonies {30}] [piano with vocals {120}] [piano with vocal {120}] [guitar solo {30}] [rock opera with vocals {45}] [105 bpm guitar with vocals {45}]

  • where the values inside the {} are seconds or some specific measure of time

  • but if they are in parentheses instead it is a percentage of time

  • and if no values are specified for a particular bracket then those unspecified are evenly divided against what was already specified

I suspect continuity could be maintained by looping in original generated audio as a Melody to base the remaining off of? With some audio overlap built in to better weave the sounds together.

Melody LLVM ERROR

LLVM ERROR: Symbol not found: __svml_cosf8_ha

I get this error when trying to use the pre-made melody idea's.
Anyone know how to fix this? everything i tried didn't work because of some OpenSSL problem...

Longer prompt results in CUDA out of memory

When using the provided example code any prompt that is longer than 2-5 words results in a out of memory error. This must be something with how this method differs from the gradio UI generation, since there I can generate with the medium model with much longer prompts without running out of memory on the same machine. The error happens specifically on the line model.generate(prompt)

  model = MusicGen.get_pretrained('small')
  model.set_generation_params(duration=8)
  wav = model.generate(prompt)
  for idx, one_wav in enumerate(wav):
  # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
      file_path = audio_write(f'music/{prompt}', one_wav.cpu(), model.sample_rate, strategy="loudness")

After some debugging I assume the issue might be due to package version issues, where my project that is trying to utilize a different version of torch or transformers than the audiocraft ones.

Solution: I didn't see that model.generate doesn't take a string but instead a list. The correct way to call it:
model.generate(descriptions=[prompt])

Train to be a perfect loop

Train a separate model to perfect the 30-second loop, ensuring it seamlessly folds/loops onto itself. This will make the output easier to use for musicians/producers, even in its current state with all of its limitations.

Parameters?

Hi! Any parameters for specific settings? Like length etc?

Music related settings addition

Please add following options as settings in future, if possible:

  • Setting target BPM (i.e. 120)
  • Setting target time signature (i.e. 3/4)
  • Setting target key (i.e. c major)

Thanks

Suggested Addition - Interpolation

Sure would be nice to be able to audiably tween between the 30 sec gens. I know other's have suggested continuation already, but there's probably some dj algorithms to tween, shift bpm, and key to auto mix/match the various clips them build loops with melody.

ModuleNotFoundError: No module named 'soundfile'

The Application will not start. I received this output:

C:\Users{ }\MusicGen\audiocraft>python app.py
Traceback (most recent call last):
File "C:\Users{ }\MusicGen\audiocraft\app.py", line 12, in
from audiocraft.models import MusicGen
File "C:\Users{ }\MusicGen\audiocraft\audiocraft_init_.py", line 8, in
from . import data, modules, models
File "C:\Users{ }\MusicGen\audiocraft\audiocraft\data_init_.py", line 8, in
from . import audio, audio_dataset
File "C:\Users{ }\MusicGen\audiocraft\audiocraft\data\audio.py", line 18, in
import soundfile
ModuleNotFoundError: No module named 'soundfile'

Same Error running the application in a virtual environment and the host environment.

NC is non-free

The model weight is listed as CC-BY-SA-NC. Non-commercial (NC) clauses are non-free as they are not free software according to the FSF, open source according to the OSI, or free culture according to Freedom Defined. I would recommend using CC-BY-SA-4.0 instead which is a free culture Creative Commons licenses.

Help running on MacOS M1?

Update: For most this should work #13 (comment)

Any chance of getting help and/or updated instructions suitable for running audiocraft on MacOS and M1? At the very least, I think I need to know where to put the models I downloaded from Hugging Face. But, it's likely based on the errors I have some other issues too. My steps + errors follow. Thanks for any tips!

I adapted the instructions here for macOS: https://github.com/facebookresearch/audiocraft#installation

First, I ran each line in my terminal...

conda create -n audiocraft
conda activate audiocraft
pip install 'torch>=2.0'
pip install -U audiocraft 
pip install ffmpeg
jupyter notebook

Second, I downloaded these two items from Hugging Face but wasn't sure where to put them: https://huggingface.co/facebook/musicgen-melody

  1. melody: 1.5B model, text to music and text+melody to music - ๐Ÿค— Hub
  2. large: 3.3B model, text to music only - ๐Ÿค— Hub

Third, when Jupyter opened in Safari I created a new notebook and ran this from here: https://github.com/facebookresearch/audiocraft#api

import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('melody')
model.set_generation_params(duration=8)  # generate 8 seconds.
wav = model.generate_unconditional(4)    # generates 4 unconditional audio samples
descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
wav = model.generate(descriptions)  # generates 3 samples.

melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness")

Fourth, I got these errors in Jupyter


AssertionError Traceback (most recent call last)
Cell In [2], line 5
2 from audiocraft.models import MusicGen
3 from audiocraft.data.audio import audio_write
----> 5 model = MusicGen.get_pretrained('melody')
6 model.set_generation_params(duration=8) # generate 8 seconds.
7 wav = model.generate_unconditional(4) # generates 4 unconditional audio samples

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/audiocraft/models/musicgen.py:88, in MusicGen.get_pretrained(name, device)
86 else:
87 ROOT = 'https://dl.fbaipublicfiles.com/audiocraft/musicgen/v0/'
---> 88 compression_model = load_compression_model(ROOT + 'b0dbef54-37d256b525.th', device=device)
89 names = {
90 'small': 'ba7a97ba-830fe5771e',
91 'medium': 'aa73ae27-fbc9f401db',
92 'large': '9b6e835c-1f0cf17b5e',
93 'melody': 'f79af192-61305ffc49',
94 }
95 sig = names[name]

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/audiocraft/models/loaders.py:45, in load_compression_model(file_or_url, device)
43 cfg = OmegaConf.create(pkg['xp.cfg'])
44 cfg.device = str(device)
---> 45 model = builders.get_compression_model(cfg)
46 model.load_state_dict(pkg['best_state'])
47 model.eval()

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/audiocraft/models/builders.py:82, in get_compression_model(cfg)
79 renormalize = renorm is not None
80 warnings.warn("You are using a deprecated EnCodec model. Please migrate to new renormalization.")
81 return EncodecModel(encoder, decoder, quantizer,
---> 82 frame_rate=frame_rate, renormalize=renormalize, **kwargs).to(cfg.device)
83 else:
84 raise KeyError(f'Unexpected compression model {cfg.compression_model}')

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py:1145, in Module.to(self, *args, **kwargs)
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
-> 1145 return self._apply(convert)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py:797, in Module._apply(self, fn)
795 def _apply(self, fn):
796 for module in self.children():
--> 797 module._apply(fn)
799 def compute_should_use_set_data(tensor, tensor_applied):
800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
801 # If the new tensor has compatible tensor type as the existing tensor,
802 # the current behavior is to change the tensor in-place using .data =,
(...)
807 # global flag to let the user control whether they want the future
808 # behavior of overwriting the existing tensor or not.

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py:797, in Module._apply(self, fn)
795 def _apply(self, fn):
796 for module in self.children():
--> 797 module._apply(fn)
799 def compute_should_use_set_data(tensor, tensor_applied):
800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
801 # If the new tensor has compatible tensor type as the existing tensor,
802 # the current behavior is to change the tensor in-place using .data =,
(...)
807 # global flag to let the user control whether they want the future
808 # behavior of overwriting the existing tensor or not.

[... skipping similar frames: Module._apply at line 797 (2 times)]

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py:797, in Module._apply(self, fn)
795 def _apply(self, fn):
796 for module in self.children():
--> 797 module._apply(fn)
799 def compute_should_use_set_data(tensor, tensor_applied):
800 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
801 # If the new tensor has compatible tensor type as the existing tensor,
802 # the current behavior is to change the tensor in-place using .data =,
(...)
807 # global flag to let the user control whether they want the future
808 # behavior of overwriting the existing tensor or not.

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py:820, in Module._apply(self, fn)
816 # Tensors stored in modules are graph leaves, and we don't want to
817 # track autograd history of param_applied, so we have to use
818 # with torch.no_grad():
819 with torch.no_grad():
--> 820 param_applied = fn(param)
821 should_use_set_data = compute_should_use_set_data(param, param_applied)
822 if should_use_set_data:

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py:1143, in Module.to..convert(t)
1140 if convert_to_format is not None and t.dim() in (4, 5):
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
-> 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/cuda/init.py:239, in _lazy_init()
235 raise RuntimeError(
236 "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
237 "multiprocessing, you must use the 'spawn' start method")
238 if not hasattr(torch._C, '_cuda_getDeviceCount'):
--> 239 raise AssertionError("Torch not compiled with CUDA enabled")
240 if _cudart is None:
241 raise AssertionError(
242 "libcudart functions unavailable. It looks like you have a broken build?")

AssertionError: Torch not compiled with CUDA enabled

melody model doesn't work.

The other models work but when I use melody I get an error:

from scipy.linalg import _fblas
ImportError: DLL load failed while importing _fblas: The specified module could not be found.

I am running this locally on 4080 16gb vram.

error with torchaudio.pyd

I have everything installed without errors still i cant run it. This is the error i get when i run app.py:
E:\audiocraft>python app.py
Traceback (most recent call last):
File "E:\audiocraft\app.py", line 11, in
from audiocraft.models import MusicGen
File "E:\audiocraft\audiocraft_init_.py", line 8, in
from . import data, modules, models
File "E:\audiocraft\audiocraft\data_init_.py", line 8, in
from . import audio, audio_dataset
File "E:\audiocraft\audiocraft\data\audio.py", line 21, in
import torchaudio as ta
File "E:\Python\Python310\lib\site-packages\torchaudio_init_.py", line 1, in
from torchaudio import ( # noqa: F401
File "E:\Python\Python310\lib\site-packages\torchaudio_extension_init_.py", line 43, in
_load_lib("libtorchaudio")
File "E:\Python\Python310\lib\site-packages\torchaudio_extension\utils.py", line 61, in load_lib
torch.ops.load_library(path)
File "E:\Python\Python310\lib\site-packages\torch_ops.py", line 643, in load_library
ctypes.CDLL(path)
File "E:\Python\Python310\lib\ctypes_init
.py", line 374, in init
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'E:\Python\Python310\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

Python script using the large model doesn't work

I am trying to run the following python script, which uses the large model:

import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('large')
model.set_generation_params(duration=8)  # generate 8 seconds.

descriptions = ['happy rock', 'energetic EDM', 'sad jazz']

wav = model.generate(descriptions)  # generates 3 samples.

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness")

I got the following logs:

Downloading: "https://dl.fbaipublicfiles.com/audiocraft/musicgen/v0/9b6e835c-1f0cf17b5e.th" to /home/worker/.cache/torch/hub/checkpoints/9b6e835c-1f0cf17b5e.th
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 6.07G/6.07G [03:57<00:00, 27.4MB/s]
Killed

It's worth mentioning that the computer freezes for a while before giving the Killed output.

Regarding my hardware, I have an AMD Ryzen 5 2500U processor with Radeon Vega Mobile Gfx and Nvidia GeForce GTX GPU (the appropriate drivers are already installed).

I'm on Ubuntu 20.04 OS.

Here is my output for nvidia-smi command:

Mon Jun 12 00:45:05 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   38C    P8    N/A /  N/A |      9MiB /  2048MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1056      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      2033      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

Here is my output to lscpu:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           17
Model name:                      AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         1600.000
CPU max MHz:                     2000,0000
CPU min MHz:                     1600,0000
BogoMIPS:                        3992.30
Virtualization:                  AMD-V
L1d cache:                       128 KiB
L1i cache:                       256 KiB
L2 cache:                        2 MiB
L3 cache:                        4 MiB
NUMA node0 CPU(s):               0-7

Thanks in advance for any help

Is it normal that it saves in mp4?

I don't know if it was something from my instalation or if it's the default of the program but here when I press to download the music file it comes as a video in mp4, is it normal or if not, is there a workaround to make it mp3 for instance?

Add generate_continuation to app.py

It would be cool if you could add generate_continuation for the Gradio app.

I was able to hack this together myself by adding a gr.Slider then modifying the predict function.
I had it so you could choose where to generate from in the song (with the continuation gr.Slider) and how long (with the duration gr.Slider).
But if you could either add this feature yourself or help guide me in the right direction for a PR I would be grateful.

Microsoft Defender Flag Source code as Trojan:Script/Sabsik.FL.B!ml

Date and time of detection 11 June 2023 01:25 PM Bangkok Timezone
How to Reproduce Effect
Open Repo
Head to download source code
External Download manager (IDM) Download File
Microsoft Defender Intercept and cancel download and give warning of virus detection
Spec detail
OS: Windows 10
Browser : firefox

Wont start

Error code is this after a new install:
C:\MusicGen\audiocraft>python app.py Traceback (most recent call last): File "C:\MusicGen\audiocraft\app.py", line 11, in <module> from audiocraft.models import MusicGen File "C:\MusicGen\audiocraft\audiocraft\__init__.py", line 8, in <module> from . import data, modules, models File "C:\MusicGen\audiocraft\audiocraft\models\__init__.py", line 8, in <module> from .musicgen import MusicGen File "C:\MusicGen\audiocraft\audiocraft\models\musicgen.py", line 17, in <module> from .encodec import CompressionModel File "C:\MusicGen\audiocraft\audiocraft\models\encodec.py", line 14, in <module> from .. import quantization as qt File "C:\MusicGen\audiocraft\audiocraft\quantization\__init__.py", line 8, in <module> from .vq import ResidualVectorQuantizer File "C:\MusicGen\audiocraft\audiocraft\quantization\vq.py", line 13, in <module> from .core_vq import ResidualVectorQuantization File "C:\MusicGen\audiocraft\audiocraft\quantization\core_vq.py", line 10, in <module> import flashy File "C:\Users\Jaulustus-Desktop\AppData\Local\Programs\Python\Python310\lib\site-packages\flashy\__init__.py", line 13, in <module> from .logging import ResultLogger, LogProgressBar, bold, setup_logging File "C:\Users\Jaulustus-Desktop\AppData\Local\Programs\Python\Python310\lib\site-packages\flashy\logging.py", line 19, in <module> from flashy.loggers.base import ExperimentLogger File "C:\Users\Jaulustus-Desktop\AppData\Local\Programs\Python\Python310\lib\site-packages\flashy\loggers\__init__.py", line 8, in <module> from .tensorboard import TensorboardLogger File "C:\Users\Jaulustus-Desktop\AppData\Local\Programs\Python\Python310\lib\site-packages\flashy\loggers\tensorboard.py", line 16, in <module> from torch.utils.tensorboard import SummaryWriter File "C:\Users\Jaulustus-Desktop\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\tensorboard\__init__.py", line 7, in <module> raise ImportError("TensorBoard logging requires TensorBoard version 1.15 or above") ImportError: TensorBoard logging requires TensorBoard version 1.15 or above

I got this error 'TypeError: issubclass() arg 1 must be a class'

Traceback (most recent call last): File "C:\musicgen\audiocraft-main\app.py", line 11, in <module> from audiocraft.models import MusicGen File "C:\musicgen\audiocraft-main\audiocraft\__init__.py", line 8, in <module> from . import data, modules, models File "C:\musicgen\audiocraft-main\audiocraft\models\__init__.py", line 8, in <module> from .musicgen import MusicGen File "C:\musicgen\audiocraft-main\audiocraft\models\musicgen.py", line 18, in <module> from .lm import LMModel File "C:\musicgen\audiocraft-main\audiocraft\models\lm.py", line 18, in <module> from ..modules.conditioners import ( File "C:\musicgen\audiocraft-main\audiocraft\modules\conditioners.py", line 19, in <module> import spacy File "C:\Users\PC\.conda\envs\musicgen\lib\site-packages\spacy\__init__.py", line 14, in <module> from . import pipeline # noqa: F401 File "C:\Users\PC\.conda\envs\musicgen\lib\site-packages\spacy\pipeline\__init__.py", line 1, in <module> from .attributeruler import AttributeRuler File "C:\Users\PC\.conda\envs\musicgen\lib\site-packages\spacy\pipeline\attributeruler.py", line 6, in <module> from .pipe import Pipe File "spacy\pipeline\pipe.pyx", line 1, in init spacy.pipeline.pipe File "spacy\vocab.pyx", line 1, in init spacy.vocab File "C:\Users\PC\.conda\envs\musicgen\lib\site-packages\spacy\tokens\__init__.py", line 1, in <module> from .doc import Doc File "spacy\tokens\doc.pyx", line 36, in init spacy.tokens.doc File "C:\Users\PC\.conda\envs\musicgen\lib\site-packages\spacy\schemas.py", line 250, in <module> class TokenPattern(BaseModel): File "pydantic\main.py", line 197, in pydantic.main.ModelMetaclass.__new__ File "pydantic\fields.py", line 506, in pydantic.fields.ModelField.infer File "pydantic\fields.py", line 436, in pydantic.fields.ModelField.__init__ File "pydantic\fields.py", line 552, in pydantic.fields.ModelField.prepare File "pydantic\fields.py", line 661, in pydantic.fields.ModelField._type_analysis File "pydantic\fields.py", line 668, in pydantic.fields.ModelField._type_analysis File "C:\Users\PC\.conda\envs\musicgen\lib\typing.py", line 852, in __subclasscheck__ return issubclass(cls, self.__origin__) TypeError: issubclass() arg 1 must be a class
trying to run it in conda enviroment but failed.

waiting for audiocraft 0.0.2 - top_p fix

Hi, since the package is still in a pre-1.0.0 stable release state, maybe it would be worthwhile to push out the 0.0.2 including the already fixed top_p .float bug.

Saved files?

Awesome work ladies/gents! Where does the gradio interface save generated files?

download the Tensor object at the end of the generation

When the program runs on collab, i was wondering if there is an easy way to download the wav file.

The usual:

from google.colab import files
files.download(res)

doesn't work on Tensor objects. I tried to convert it to wav or mp3 to no avail. I tried modifying utils/notebook.py display_audio function and i couldn't figure it out.

Does anyone knows how to take the samples and samples_rate and create the wav (or mp3) file? Is Torchaudio the right library for that purpose?

A matching Triton is not available, some optimizations will not be enabled.

I am preparing a tutorial for windows to hopefully publish on my channel today :
https://www.youtube.com/secourses

I got the first music sample generated

But when starting I got this hated message :/

A matching Triton is not available, some optimizations will not be enabled.

Anyway to install this very annoying Triton in windows?

Here my pip freeze on a fresh venv for this project

Also I am using gradio interface and I got this warning as well

F:\audiocraft\venv\lib\site-packages\gradio\processing_utils.py:171: UserWarning: Trying to convert audio automatically from float32 to 16-bit int format.
  warnings.warn(warning.format(data.dtype))

any my first music sample with small model attached

test1.mp4

You've compared the AI to MusicLM but when? MusicLM has degraded in quality due to trophies users give it. Also here is my comparison between the 2 AIs.

Hello! Sry about the now edited out word in the title my, it sounds rude oops. So from my day1 testing and later testing, yes it seems quality of [MusicLM] (to compare against yours) and prompt listening has degraded I think. That's what the trophies do, they change the model right? It is a nightmare for my AI documentation :) :( cry.

Also I tested MusicGen against MusicLM on hard advanced techno tests, yours seems to win 30% of the tests or so. And the hardest ones it fails - even though I ran them today June 11 on MusicLM about a month after its release.

MusicGen tests:
https://soundcloud.com/immortal-discoveries/sets/musicgen-ai-tests-now-seems-worse-than-musiclm

MusicLM tests:
https://soundcloud.com/immortal-discoveries/sets/adding-to-musiclm-playlist-the-one-with-200-if-no-prompt-go-to-link

Also a must see for you guys (plus 200 more on soundcloud):
https://www.reddit.com/r/singularity/comments/13h0zyy/i_really_crank_out_music_tracks_with_musiclm_this/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.