Giter VIP home page Giter VIP logo

retrieval-based-voice-conversion's Introduction

Retrieval-based-Voice-Conversion

An easy-to-use Voice Conversion framework based on VITS.

madewithlove


Licence

Discord


Note

Currently under development... Provided as a library and API in rvc

Installation and usage

Standard Setup

First, create a directory in your project. The assets folder will contain the models needed for inference and training, and the result folder will contain the results of the training.

rvc init

This will create an assets folder and .env in your working directory.

Warning

The directory should be empty or without an assets folder.

Custom Setup

If you have already downloaded models or want to change these configurations, edit the .env file. If you do not already have a .env file,

rvc env create

can create one.

Also, when downloading a model, you can use the

rvc dlmodel

or

rvc dlmodel {download_dir}

Finally, specify the location of the model in the env file, and you are done!

Library Usage

Inference Audio

from pathlib import Path

from dotenv import load_dotenv
from scipy.io import wavfile

from rvc.modules.vc.modules import VC


def main():
      vc = VC()
      vc.get_vc("{model.pth}")
      tgt_sr, audio_opt, times, _ = vc.vc_inference(
            1, Path("{InputAudio}")
      )
      wavfile.write("{OutputAudio}", tgt_sr, audio_opt)


if __name__ == "__main__":
      load_dotenv("{envPath}")
      main()

CLI Usage

Inference Audio

rvc infer -m {model.pth} -i {input.wav} -o {output.wav}
option flag  type default value description
modelPath -m Path *required Model path or filename (reads in the directory set in env)
inputPath -i Path *required Input audio path or folder
outputPath -o Path *required Output audio path or folder
sid -s int 0 Speaker/Singer ID
f0_up_key -fu int 0 Transpose (integer, number of semitones, raise by an octave: 12, lower by an octave: -12)
f0_method -fm str rmvpe pitch extraction algorithm (pm, harvest, crepe, rmvpe
f0_file -ff Path | None None F0 curve file (optional). One pitch per line. Replaces the default F0 and pitch modulation
index_file -if Path | None None Path to the feature index file
index_rate -if float 0.75 Search feature ratio (controls accent strength, too high has artifacting)
filter_radius -fr int 3 If >=3: apply median filtering to the harvested pitch results. The value represents the filter radius and can reduce breathiness
resample_sr -rsr int 0 Resample the output audio in post-processing to the final sample rate. Set to 0 for no resampling
rms_mix_rate -rmr float 0.25 Adjust the volume envelope scaling. Closer to 0, the more it mimicks the volume of the original vocals. Can help mask noise and make volume sound more natural when set relatively low. Closer to 1 will be more of a consistently loud volume
protect -p float 0.33 Protect voiceless consonants and breath sounds to prevent artifacts such as tearing in electronic music. Set to 0.5 to disable. Decrease the value to increase protection, but it may reduce indexing accuracy

API Usage

First, start up the server.

rvc-api

or

poetry run poe rvc-api

Inference Audio

Get as blob
curl -X 'POST' \
      'http://127.0.0.1:8000/inference?res_type=blob' \
      -H 'accept: application/json' \
      -H 'Content-Type: multipart/form-data' \
      -F 'modelpath={model.pth}' \
      -F 'input={input audio path}'
Get as json(include time)
curl -X 'POST' \
      'http://127.0.0.1:8000/inference?res_type=json' \
      -H 'accept: application/json' \
      -H 'Content-Type: multipart/form-data' \
      -F 'modelpath={model.pth}' \
      -F 'input={input audio path}'

Docker Usage

Build and run via script:

./docker-run.sh

Or use manually:

  1. Build:

    docker build -t "rvc" .
  2. Run:

    docker run -it \
      -p 8000:8000 \
      -v "${PWD}/assets/weights:/weights:ro" \
      -v "${PWD}/assets/indices:/indices:ro" \
      -v "${PWD}/assets/audios:/audios:ro" \
      "rvc"

Notice assumption that weights, indices and input audios are stored in current-directory/assets

retrieval-based-voice-conversion's People

Contributors

alcofttao avatar danand avatar tps-f avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

retrieval-based-voice-conversion's Issues

TypeError: expected str, bytes or os.PathLike object, not NoneType

I have this error when using vc.get_vc("/home/alcoft/Descargas/Modelos RVC/I4.0 V4/NEKOTSUBA_BI_VOICEVOX.pth").

This is the traceback:

Traceback (most recent call last):
  File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/ai_server_all.py", line 1, in <module>
    import ai_server
  File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/ai_server.py", line 721, in <module>
    start_server()
  File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/ai_server.py", line 701, in start_server
    cb.LoadAllModels()
  File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/chatbot_all.py", line 130, in LoadAllModels
    rvc.LoadModel()
  File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/Inference/RVC_inference.py", line 27, in LoadModel
    vc.get_vc("/home/alcoft/Descargas/Modelos RVC/I4.0 V4/NEKOTSUBA_BI_VOICEVOX.pth")
  File "/home/alcoft/Projects/Multilang/TAO_I4.0/I4.0_ENV/lib/python3.11/site-packages/rvc/modules/vc/modules.py", line 84, in get_vc
    index = get_index_path_from_model(sid)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alcoft/Projects/Multilang/TAO_I4.0/I4.0_ENV/lib/python3.11/site-packages/rvc/modules/vc/utils.py", line 12, in get_index_path_from_model
    for root, _, files in os.walk(os.getenv("index_root"), topdown=False)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen os>", line 343, in walk
TypeError: expected str, bytes or os.PathLike object, not NoneType

My code is:

from pathlib import Path
from scipy.io import wavfile
from rvc.modules.vc.modules import VC
import os
import torch
import json
import ai_config as cfg

device: str = "cpu"
vc: VC = VC()

def LoadModel() -> None:
    global device, vc

    if (not cfg.current_data.prompt_order.__contains__("rvc")):
        raise Exception("Model is not in 'prompt_order'.")
    
    if (cfg.current_data.rvc_method != "rmvpe" and cfg.current_data.rvc_method != "pm" and cfg.current_data.rvc_method != "harvest" and cfg.current_data.rvc_method != "crepe"):
        raise Exception("RVC method must be 'rmvpe', 'pm', 'harvest' or 'crepe'.")

    if (len(cfg.current_data.rvc_model_path) == 0):
        return

    device = "cuda" if (torch.cuda.is_available() and cfg.current_data.use_gpu_if_available and cfg.current_data.move_to_gpu.count("rvc") > 0) else "cpu"
    vc.config.device = device

    vc.get_vc("/home/alcoft/Descargas/Modelos RVC/I4.0 V4/NEKOTSUBA_BI_VOICEVOX.pth")

def __make_rvc__(audio_name: str, protect: float = 0.33, filter_radius: int = 3) -> bytes:
    LoadModel()

    tgt_sr, audio_opt, times, _ = vc.vc_single(1, Path(audio_name), f0_method = cfg.current_data.rvc_method, index_file = Path(cfg.current_data.rvc_index_path), filter_radius = filter_radius, protect = protect)
    
    output_file = "tmp_rvc_output_"
    output_file_id = 0
    output_file_path = output_file + str(output_file_id) + ".wav"

    while (os.path.exists(output_file_path)):
        output_file_id += 1
        output_file_path = output_file + str(output_file_id) + ".wav"

    wavfile.write(output_file_path, tgt_sr, audio_opt)

    with open(output_file_path, "wb") as f:
        audio_bytes = f.read()
        f.close()
    
    os.remove(output_file_path)
    return audio_bytes

def MakeRVC(data: str | dict[str]) -> bytes:
    if (type(data) == str):
        try:
            data = json.loads(data)
        except Exception as ex:
            raise Exception("[RVC] Data must be a dictionary or a JSON code. ERROR: " + str(ex))
    
    ddata = {
        "input": "",
        "protect": 0.33,
        "filter_radius": 3
    }

    try:
        ddata["input"] = data["input"]
    except:
        raise Exception("Unable to get audio path.")
    
    try:
        ddata["protect"] = float(data["protect"])
    except:
        pass

    try:
        ddata["filter_radius"] = int(data["filter_radius"])
    except:
        pass

    return __make_rvc__(ddata["input"], ddata["protect"], ddata["filter_radius"], ddata["method"])

My Python version is Python 3.11.6 and the model path is /home/alcoft/Descargas/Modelos RVC/I4.0 V4/NEKOTSUBA_BI_VOICEVOX.pth.

Can someone help me fix this?

Error when using UVR DeEcho models with UVR.uvr_wrapper()

Getting this error message when using UVR-De-Echo-Aggressive.pth or UVR-De-Echo-Normal.pth with UVR.uvr_wrapper()

Traceback (most recent call last):
  File "C:\Users\jeje9\Desktop\rvc_test\rvc_test.py", line 62, in <module>
    for item in generator:
  File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\modules.py", line 49, in uvr_wrapper
    pre_fun = func(
  File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\vr.py", line 34, in __init__
    model.load_state_dict(cpk)
  File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CascadedASPPNet:
        Missing key(s) in state_dict: "stg1_low_band_net.enc1.conv1.conv.0.weight", "stg1_low_band_net.enc1.conv1.conv.1.weight", "stg1_low_band_net.enc1.conv1.conv.1.bias", "stg1_low_band_net.enc1.conv1.conv.1.running_mean", "stg1_low_band_net.enc1.conv1.conv.1.running_var", "stg1_low_band_net.enc1.conv2.conv.0.weight", "stg1_low_band_net.enc1.conv2.conv.1.weight", "stg1_low_band_net.enc1.conv2.conv.1.bias", "stg1_low_band_net.enc1.conv2.conv.1.running_mean", "stg1_low_band_net.enc1.conv2.conv.1.running_var", "stg1_low_band_net.enc2.conv1.conv.0.weight", "stg1_low_band_net.enc2.conv1.conv.1.weight", "stg1_low_band_net.enc2.conv1.conv.1.bias", "stg1_low_band_net.enc2.conv1.conv.1.running_mean", "stg1_low_band_net.enc2.conv1.conv.1.running_var", "stg1_low_band_net.enc2.conv2.conv.0.weight", "stg1_low_band_net.enc2.conv2.conv.1.weight", "stg1_low_band_net.enc2.conv2.conv.1.bias", "stg1_low_band_net.enc2.conv2.conv.1.running_mean", "stg1_low_band_net.enc2.conv2.conv.1.running_var", "stg1_low_band_net.enc3.conv1.conv.0.weight", "stg1_low_band_net.enc3.conv1.conv.1.weight", "stg1_low_band_net.enc3.conv1.conv.1.bias", "stg1_low_band_net.enc3.conv1.conv.1.running_mean", "stg1_low_band_net.enc3.conv1.conv.1.running_var", "stg1_low_band_net.enc3.conv2.conv.0.weight", "stg1_low_band_net.enc3.conv2.conv.1.weight", "stg1_low_band_net.enc3.conv2.conv.1.bias", "stg1_low_band_net.enc3.conv2.conv.1.running_mean", "stg1_low_band_net.enc3.conv2.conv.1.running_var", "stg1_low_band_net.enc4.conv1.conv.0.weight", "stg1_low_band_net.enc4.conv1.conv.1.weight", "stg1_low_band_net.enc4.conv1.conv.1.bias", "stg1_low_band_net.enc4.conv1.conv.1.running_mean", "stg1_low_band_net.enc4.conv1.conv.1.running_var", "stg1_low_band_net.enc4.conv2.conv.0.weight", "stg1_low_band_net.enc4.conv2.conv.1.weight", "stg1_low_band_net.enc4.conv2.conv.1.bias", "stg1_low_band_net.enc4.conv2.conv.1.running_mean", "stg1_low_band_net.enc4.conv2.conv.1.running_var", "stg1_low_band_net.aspp.conv1.1.conv.0.weight", "stg1_low_band_net.aspp.conv1.1.conv.1.weight", "stg1_low_band_net.aspp.conv1.1.conv.1.bias", "stg1_low_band_net.aspp.conv1.1.conv.1.running_mean", "stg1_low_band_net.aspp.conv1.1.conv.1.running_var", "stg1_low_band_net.aspp.conv2.conv.0.weight", "stg1_low_band_net.aspp.conv2.conv.1.weight", "stg1_low_band_net.aspp.conv2.conv.1.bias", "stg1_low_band_net.aspp.conv2.conv.1.running_mean", "stg1_low_band_net.aspp.conv2.conv.1.running_var", "stg1_low_band_net.aspp.conv3.conv.0.weight", "stg1_low_band_net.aspp.conv3.conv.1.weight", "stg1_low_band_net.aspp.conv3.conv.2.weight", "stg1_low_band_net.aspp.conv3.conv.2.bias", "stg1_low_band_net.aspp.conv3.conv.2.running_mean", "stg1_low_band_net.aspp.conv3.conv.2.running_var", "stg1_low_band_net.aspp.conv4.conv.0.weight", "stg1_low_band_net.aspp.conv4.conv.1.weight", "stg1_low_band_net.aspp.conv4.conv.2.weight", "stg1_low_band_net.aspp.conv4.conv.2.bias", "stg1_low_band_net.aspp.conv4.conv.2.running_mean", "stg1_low_band_net.aspp.conv4.conv.2.running_var", "stg1_low_band_net.aspp.conv5.conv.0.weight", "stg1_low_band_net.aspp.conv5.conv.1.weight", "stg1_low_band_net.aspp.conv5.conv.2.weight", "stg1_low_band_net.aspp.conv5.conv.2.bias", "stg1_low_band_net.aspp.conv5.conv.2.running_mean", "stg1_low_band_net.aspp.conv5.conv.2.running_var", "stg1_low_band_net.aspp.bottleneck.0.conv.0.weight", "stg1_low_band_net.aspp.bottleneck.0.conv.1.weight", "stg1_low_band_net.aspp.bottleneck.0.conv.1.bias", "stg1_low_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg1_low_band_net.aspp.bottleneck.0.conv.1.running_var", "stg1_low_band_net.dec4.conv.conv.0.weight", "stg1_low_band_net.dec4.conv.conv.1.weight", "stg1_low_band_net.dec4.conv.conv.1.bias", "stg1_low_band_net.dec4.conv.conv.1.running_mean", "stg1_low_band_net.dec4.conv.conv.1.running_var", "stg1_low_band_net.dec3.conv.conv.0.weight", "stg1_low_band_net.dec3.conv.conv.1.weight", "stg1_low_band_net.dec3.conv.conv.1.bias", "stg1_low_band_net.dec3.conv.conv.1.running_mean", "stg1_low_band_net.dec3.conv.conv.1.running_var", "stg1_low_band_net.dec2.conv.conv.0.weight", "stg1_low_band_net.dec2.conv.conv.1.weight", "stg1_low_band_net.dec2.conv.conv.1.bias", "stg1_low_band_net.dec2.conv.conv.1.running_mean", "stg1_low_band_net.dec2.conv.conv.1.running_var", "stg1_low_band_net.dec1.conv.conv.0.weight", "stg1_low_band_net.dec1.conv.conv.1.weight", "stg1_low_band_net.dec1.conv.conv.1.bias", "stg1_low_band_net.dec1.conv.conv.1.running_mean", "stg1_low_band_net.dec1.conv.conv.1.running_var", "stg1_high_band_net.enc1.conv1.conv.0.weight", "stg1_high_band_net.enc1.conv1.conv.1.weight", "stg1_high_band_net.enc1.conv1.conv.1.bias", "stg1_high_band_net.enc1.conv1.conv.1.running_mean", "stg1_high_band_net.enc1.conv1.conv.1.running_var", "stg1_high_band_net.enc1.conv2.conv.0.weight", "stg1_high_band_net.enc1.conv2.conv.1.weight", "stg1_high_band_net.enc1.conv2.conv.1.bias", "stg1_high_band_net.enc1.conv2.conv.1.running_mean", "stg1_high_band_net.enc1.conv2.conv.1.running_var", "stg1_high_band_net.aspp.conv3.conv.2.weight", "stg1_high_band_net.aspp.conv3.conv.2.bias", "stg1_high_band_net.aspp.conv3.conv.2.running_mean", "stg1_high_band_net.aspp.conv3.conv.2.running_var", "stg1_high_band_net.aspp.conv4.conv.2.weight", "stg1_high_band_net.aspp.conv4.conv.2.bias", "stg1_high_band_net.aspp.conv4.conv.2.running_mean", "stg1_high_band_net.aspp.conv4.conv.2.running_var", "stg1_high_band_net.aspp.conv5.conv.2.weight", "stg1_high_band_net.aspp.conv5.conv.2.bias", "stg1_high_band_net.aspp.conv5.conv.2.running_mean", "stg1_high_band_net.aspp.conv5.conv.2.running_var", "stg1_high_band_net.aspp.bottleneck.0.conv.0.weight", "stg1_high_band_net.aspp.bottleneck.0.conv.1.weight", "stg1_high_band_net.aspp.bottleneck.0.conv.1.bias", "stg1_high_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg1_high_band_net.aspp.bottleneck.0.conv.1.running_var", "stg1_high_band_net.dec4.conv.conv.0.weight", "stg1_high_band_net.dec4.conv.conv.1.weight", "stg1_high_band_net.dec4.conv.conv.1.bias", "stg1_high_band_net.dec4.conv.conv.1.running_mean", "stg1_high_band_net.dec4.conv.conv.1.running_var", "stg1_high_band_net.dec3.conv.conv.0.weight", "stg1_high_band_net.dec3.conv.conv.1.weight", "stg1_high_band_net.dec3.conv.conv.1.bias", "stg1_high_band_net.dec3.conv.conv.1.running_mean", "stg1_high_band_net.dec3.conv.conv.1.running_var", "stg1_high_band_net.dec2.conv.conv.0.weight", "stg1_high_band_net.dec2.conv.conv.1.weight", "stg1_high_band_net.dec2.conv.conv.1.bias", "stg1_high_band_net.dec2.conv.conv.1.running_mean", "stg1_high_band_net.dec2.conv.conv.1.running_var", "stg1_high_band_net.dec1.conv.conv.0.weight", "stg1_high_band_net.dec1.conv.conv.1.weight", "stg1_high_band_net.dec1.conv.conv.1.bias", "stg1_high_band_net.dec1.conv.conv.1.running_mean", "stg1_high_band_net.dec1.conv.conv.1.running_var", "stg2_bridge.conv.0.weight", "stg2_bridge.conv.1.weight", "stg2_bridge.conv.1.bias", "stg2_bridge.conv.1.running_mean", "stg2_bridge.conv.1.running_var", "stg2_full_band_net.enc1.conv1.conv.0.weight", "stg2_full_band_net.enc1.conv1.conv.1.weight", "stg2_full_band_net.enc1.conv1.conv.1.bias", "stg2_full_band_net.enc1.conv1.conv.1.running_mean", "stg2_full_band_net.enc1.conv1.conv.1.running_var", "stg2_full_band_net.enc1.conv2.conv.0.weight", "stg2_full_band_net.enc1.conv2.conv.1.weight", "stg2_full_band_net.enc1.conv2.conv.1.bias", "stg2_full_band_net.enc1.conv2.conv.1.running_mean", "stg2_full_band_net.enc1.conv2.conv.1.running_var", "stg2_full_band_net.enc2.conv1.conv.0.weight", "stg2_full_band_net.enc2.conv1.conv.1.weight", "stg2_full_band_net.enc2.conv1.conv.1.bias", "stg2_full_band_net.enc2.conv1.conv.1.running_mean", "stg2_full_band_net.enc2.conv1.conv.1.running_var", "stg2_full_band_net.enc2.conv2.conv.0.weight", "stg2_full_band_net.enc2.conv2.conv.1.weight", "stg2_full_band_net.enc2.conv2.conv.1.bias", "stg2_full_band_net.enc2.conv2.conv.1.running_mean", "stg2_full_band_net.enc2.conv2.conv.1.running_var", "stg2_full_band_net.enc3.conv1.conv.0.weight", "stg2_full_band_net.enc3.conv1.conv.1.weight", "stg2_full_band_net.enc3.conv1.conv.1.bias", "stg2_full_band_net.enc3.conv1.conv.1.running_mean", "stg2_full_band_net.enc3.conv1.conv.1.running_var", "stg2_full_band_net.enc3.conv2.conv.0.weight", "stg2_full_band_net.enc3.conv2.conv.1.weight", "stg2_full_band_net.enc3.conv2.conv.1.bias", "stg2_full_band_net.enc3.conv2.conv.1.running_mean", "stg2_full_band_net.enc3.conv2.conv.1.running_var", "stg2_full_band_net.enc4.conv1.conv.0.weight", "stg2_full_band_net.enc4.conv1.conv.1.weight", "stg2_full_band_net.enc4.conv1.conv.1.bias", "stg2_full_band_net.enc4.conv1.conv.1.running_mean", "stg2_full_band_net.enc4.conv1.conv.1.running_var", "stg2_full_band_net.enc4.conv2.conv.0.weight", "stg2_full_band_net.enc4.conv2.conv.1.weight", "stg2_full_band_net.enc4.conv2.conv.1.bias", "stg2_full_band_net.enc4.conv2.conv.1.running_mean", "stg2_full_band_net.enc4.conv2.conv.1.running_var", "stg2_full_band_net.aspp.conv1.1.conv.0.weight", "stg2_full_band_net.aspp.conv1.1.conv.1.weight", "stg2_full_band_net.aspp.conv1.1.conv.1.bias", "stg2_full_band_net.aspp.conv1.1.conv.1.running_mean", "stg2_full_band_net.aspp.conv1.1.conv.1.running_var", "stg2_full_band_net.aspp.conv2.conv.0.weight", "stg2_full_band_net.aspp.conv2.conv.1.weight", "stg2_full_band_net.aspp.conv2.conv.1.bias", "stg2_full_band_net.aspp.conv2.conv.1.running_mean", "stg2_full_band_net.aspp.conv2.conv.1.running_var", "stg2_full_band_net.aspp.conv3.conv.0.weight", "stg2_full_band_net.aspp.conv3.conv.1.weight", "stg2_full_band_net.aspp.conv3.conv.2.weight", "stg2_full_band_net.aspp.conv3.conv.2.bias", "stg2_full_band_net.aspp.conv3.conv.2.running_mean", "stg2_full_band_net.aspp.conv3.conv.2.running_var", "stg2_full_band_net.aspp.conv4.conv.0.weight", "stg2_full_band_net.aspp.conv4.conv.1.weight", "stg2_full_band_net.aspp.conv4.conv.2.weight", "stg2_full_band_net.aspp.conv4.conv.2.bias", "stg2_full_band_net.aspp.conv4.conv.2.running_mean", "stg2_full_band_net.aspp.conv4.conv.2.running_var", "stg2_full_band_net.aspp.conv5.conv.0.weight", "stg2_full_band_net.aspp.conv5.conv.1.weight", "stg2_full_band_net.aspp.conv5.conv.2.weight", "stg2_full_band_net.aspp.conv5.conv.2.bias", "stg2_full_band_net.aspp.conv5.conv.2.running_mean", "stg2_full_band_net.aspp.conv5.conv.2.running_var", "stg2_full_band_net.aspp.bottleneck.0.conv.0.weight", "stg2_full_band_net.aspp.bottleneck.0.conv.1.weight", "stg2_full_band_net.aspp.bottleneck.0.conv.1.bias", "stg2_full_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg2_full_band_net.aspp.bottleneck.0.conv.1.running_var", "stg2_full_band_net.dec4.conv.conv.0.weight", "stg2_full_band_net.dec4.conv.conv.1.weight", "stg2_full_band_net.dec4.conv.conv.1.bias", "stg2_full_band_net.dec4.conv.conv.1.running_mean", "stg2_full_band_net.dec4.conv.conv.1.running_var", "stg2_full_band_net.dec3.conv.conv.0.weight", "stg2_full_band_net.dec3.conv.conv.1.weight", "stg2_full_band_net.dec3.conv.conv.1.bias", "stg2_full_band_net.dec3.conv.conv.1.running_mean", "stg2_full_band_net.dec3.conv.conv.1.running_var", "stg2_full_band_net.dec2.conv.conv.0.weight", "stg2_full_band_net.dec2.conv.conv.1.weight", "stg2_full_band_net.dec2.conv.conv.1.bias", "stg2_full_band_net.dec2.conv.conv.1.running_mean", "stg2_full_band_net.dec2.conv.conv.1.running_var", "stg2_full_band_net.dec1.conv.conv.0.weight", "stg2_full_band_net.dec1.conv.conv.1.weight", "stg2_full_band_net.dec1.conv.conv.1.bias", "stg2_full_band_net.dec1.conv.conv.1.running_mean", "stg2_full_band_net.dec1.conv.conv.1.running_var", "stg3_bridge.conv.0.weight", "stg3_bridge.conv.1.weight", "stg3_bridge.conv.1.bias", "stg3_bridge.conv.1.running_mean", "stg3_bridge.conv.1.running_var", "stg3_full_band_net.enc1.conv1.conv.0.weight", "stg3_full_band_net.enc1.conv1.conv.1.weight", "stg3_full_band_net.enc1.conv1.conv.1.bias", "stg3_full_band_net.enc1.conv1.conv.1.running_mean", "stg3_full_band_net.enc1.conv1.conv.1.running_var", "stg3_full_band_net.enc1.conv2.conv.0.weight", "stg3_full_band_net.enc1.conv2.conv.1.weight", "stg3_full_band_net.enc1.conv2.conv.1.bias", "stg3_full_band_net.enc1.conv2.conv.1.running_mean", "stg3_full_band_net.enc1.conv2.conv.1.running_var", "stg3_full_band_net.aspp.conv3.conv.2.weight", "stg3_full_band_net.aspp.conv3.conv.2.bias", "stg3_full_band_net.aspp.conv3.conv.2.running_mean", "stg3_full_band_net.aspp.conv3.conv.2.running_var", "stg3_full_band_net.aspp.conv4.conv.2.weight", "stg3_full_band_net.aspp.conv4.conv.2.bias", "stg3_full_band_net.aspp.conv4.conv.2.running_mean", "stg3_full_band_net.aspp.conv4.conv.2.running_var", "stg3_full_band_net.aspp.conv5.conv.2.weight", "stg3_full_band_net.aspp.conv5.conv.2.bias", "stg3_full_band_net.aspp.conv5.conv.2.running_mean", "stg3_full_band_net.aspp.conv5.conv.2.running_var", "stg3_full_band_net.aspp.bottleneck.0.conv.0.weight", "stg3_full_band_net.aspp.bottleneck.0.conv.1.weight", "stg3_full_band_net.aspp.bottleneck.0.conv.1.bias", "stg3_full_band_net.aspp.bottleneck.0.conv.1.running_mean", "stg3_full_band_net.aspp.bottleneck.0.conv.1.running_var", "stg3_full_band_net.dec4.conv.conv.0.weight", "stg3_full_band_net.dec4.conv.conv.1.weight", "stg3_full_band_net.dec4.conv.conv.1.bias", "stg3_full_band_net.dec4.conv.conv.1.running_mean", "stg3_full_band_net.dec4.conv.conv.1.running_var", "stg3_full_band_net.dec3.conv.conv.0.weight", "stg3_full_band_net.dec3.conv.conv.1.weight", "stg3_full_band_net.dec3.conv.conv.1.bias", "stg3_full_band_net.dec3.conv.conv.1.running_mean", "stg3_full_band_net.dec3.conv.conv.1.running_var", "stg3_full_band_net.dec2.conv.conv.0.weight", "stg3_full_band_net.dec2.conv.conv.1.weight", "stg3_full_band_net.dec2.conv.conv.1.bias", "stg3_full_band_net.dec2.conv.conv.1.running_mean", "stg3_full_band_net.dec2.conv.conv.1.running_var", "stg3_full_band_net.dec1.conv.conv.0.weight", "stg3_full_band_net.dec1.conv.conv.1.weight", "stg3_full_band_net.dec1.conv.conv.1.bias", "stg3_full_band_net.dec1.conv.conv.1.running_mean", "stg3_full_band_net.dec1.conv.conv.1.running_var", "aux1_out.weight", "aux2_out.weight".
        Unexpected key(s) in state_dict: "stg2_low_band_net.0.enc1.conv.0.weight", "stg2_low_band_net.0.enc1.conv.1.weight", "stg2_low_band_net.0.enc1.conv.1.bias", "stg2_low_band_net.0.enc1.conv.1.running_mean", "stg2_low_band_net.0.enc1.conv.1.running_var", "stg2_low_band_net.0.enc1.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc2.conv1.conv.0.weight", "stg2_low_band_net.0.enc2.conv1.conv.1.weight", "stg2_low_band_net.0.enc2.conv1.conv.1.bias", "stg2_low_band_net.0.enc2.conv1.conv.1.running_mean", "stg2_low_band_net.0.enc2.conv1.conv.1.running_var", "stg2_low_band_net.0.enc2.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc2.conv2.conv.0.weight", "stg2_low_band_net.0.enc2.conv2.conv.1.weight", "stg2_low_band_net.0.enc2.conv2.conv.1.bias", "stg2_low_band_net.0.enc2.conv2.conv.1.running_mean", "stg2_low_band_net.0.enc2.conv2.conv.1.running_var", "stg2_low_band_net.0.enc2.conv2.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc3.conv1.conv.0.weight", "stg2_low_band_net.0.enc3.conv1.conv.1.weight", "stg2_low_band_net.0.enc3.conv1.conv.1.bias", "stg2_low_band_net.0.enc3.conv1.conv.1.running_mean", "stg2_low_band_net.0.enc3.conv1.conv.1.running_var", "stg2_low_band_net.0.enc3.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc3.conv2.conv.0.weight", "stg2_low_band_net.0.enc3.conv2.conv.1.weight", "stg2_low_band_net.0.enc3.conv2.conv.1.bias", "stg2_low_band_net.0.enc3.conv2.conv.1.running_mean", "stg2_low_band_net.0.enc3.conv2.conv.1.running_var", "stg2_low_band_net.0.enc3.conv2.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc4.conv1.conv.0.weight", "stg2_low_band_net.0.enc4.conv1.conv.1.weight", "stg2_low_band_net.0.enc4.conv1.conv.1.bias", "stg2_low_band_net.0.enc4.conv1.conv.1.running_mean", "stg2_low_band_net.0.enc4.conv1.conv.1.running_var", "stg2_low_band_net.0.enc4.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc4.conv2.conv.0.weight", "stg2_low_band_net.0.enc4.conv2.conv.1.weight", "stg2_low_band_net.0.enc4.conv2.conv.1.bias", "stg2_low_band_net.0.enc4.conv2.conv.1.running_mean", "stg2_low_band_net.0.enc4.conv2.conv.1.running_var", "stg2_low_band_net.0.enc4.conv2.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc5.conv1.conv.0.weight", "stg2_low_band_net.0.enc5.conv1.conv.1.weight", "stg2_low_band_net.0.enc5.conv1.conv.1.bias", "stg2_low_band_net.0.enc5.conv1.conv.1.running_mean", "stg2_low_band_net.0.enc5.conv1.conv.1.running_var", "stg2_low_band_net.0.enc5.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.enc5.conv2.conv.0.weight", "stg2_low_band_net.0.enc5.conv2.conv.1.weight", "stg2_low_band_net.0.enc5.conv2.conv.1.bias", "stg2_low_band_net.0.enc5.conv2.conv.1.running_mean", "stg2_low_band_net.0.enc5.conv2.conv.1.running_var", "stg2_low_band_net.0.enc5.conv2.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.conv1.1.conv.0.weight", "stg2_low_band_net.0.aspp.conv1.1.conv.1.weight", "stg2_low_band_net.0.aspp.conv1.1.conv.1.bias", "stg2_low_band_net.0.aspp.conv1.1.conv.1.running_mean", "stg2_low_band_net.0.aspp.conv1.1.conv.1.running_var", "stg2_low_band_net.0.aspp.conv1.1.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.conv2.conv.0.weight", "stg2_low_band_net.0.aspp.conv2.conv.1.weight", "stg2_low_band_net.0.aspp.conv2.conv.1.bias", "stg2_low_band_net.0.aspp.conv2.conv.1.running_mean", "stg2_low_band_net.0.aspp.conv2.conv.1.running_var", "stg2_low_band_net.0.aspp.conv2.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.conv3.conv.0.weight", "stg2_low_band_net.0.aspp.conv3.conv.1.weight", "stg2_low_band_net.0.aspp.conv3.conv.1.bias", "stg2_low_band_net.0.aspp.conv3.conv.1.running_mean", "stg2_low_band_net.0.aspp.conv3.conv.1.running_var", "stg2_low_band_net.0.aspp.conv3.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.conv4.conv.0.weight", "stg2_low_band_net.0.aspp.conv4.conv.1.weight", "stg2_low_band_net.0.aspp.conv4.conv.1.bias", "stg2_low_band_net.0.aspp.conv4.conv.1.running_mean", "stg2_low_band_net.0.aspp.conv4.conv.1.running_var", "stg2_low_band_net.0.aspp.conv4.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.conv5.conv.0.weight", "stg2_low_band_net.0.aspp.conv5.conv.1.weight", "stg2_low_band_net.0.aspp.conv5.conv.1.bias", "stg2_low_band_net.0.aspp.conv5.conv.1.running_mean", "stg2_low_band_net.0.aspp.conv5.conv.1.running_var", "stg2_low_band_net.0.aspp.conv5.conv.1.num_batches_tracked", "stg2_low_band_net.0.aspp.bottleneck.conv.0.weight", "stg2_low_band_net.0.aspp.bottleneck.conv.1.weight", "stg2_low_band_net.0.aspp.bottleneck.conv.1.bias", "stg2_low_band_net.0.aspp.bottleneck.conv.1.running_mean", "stg2_low_band_net.0.aspp.bottleneck.conv.1.running_var", "stg2_low_band_net.0.aspp.bottleneck.conv.1.num_batches_tracked", "stg2_low_band_net.0.dec4.conv1.conv.0.weight", "stg2_low_band_net.0.dec4.conv1.conv.1.weight", "stg2_low_band_net.0.dec4.conv1.conv.1.bias", "stg2_low_band_net.0.dec4.conv1.conv.1.running_mean", "stg2_low_band_net.0.dec4.conv1.conv.1.running_var", "stg2_low_band_net.0.dec4.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.dec3.conv1.conv.0.weight", "stg2_low_band_net.0.dec3.conv1.conv.1.weight", "stg2_low_band_net.0.dec3.conv1.conv.1.bias", "stg2_low_band_net.0.dec3.conv1.conv.1.running_mean", "stg2_low_band_net.0.dec3.conv1.conv.1.running_var", "stg2_low_band_net.0.dec3.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.dec2.conv1.conv.0.weight", "stg2_low_band_net.0.dec2.conv1.conv.1.weight", "stg2_low_band_net.0.dec2.conv1.conv.1.bias", "stg2_low_band_net.0.dec2.conv1.conv.1.running_mean", "stg2_low_band_net.0.dec2.conv1.conv.1.running_var", "stg2_low_band_net.0.dec2.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.0.lstm_dec2.conv.conv.0.weight", "stg2_low_band_net.0.lstm_dec2.conv.conv.1.weight", "stg2_low_band_net.0.lstm_dec2.conv.conv.1.bias", "stg2_low_band_net.0.lstm_dec2.conv.conv.1.running_mean", "stg2_low_band_net.0.lstm_dec2.conv.conv.1.running_var", "stg2_low_band_net.0.lstm_dec2.conv.conv.1.num_batches_tracked", "stg2_low_band_net.0.lstm_dec2.lstm.weight_ih_l0", "stg2_low_band_net.0.lstm_dec2.lstm.weight_hh_l0", "stg2_low_band_net.0.lstm_dec2.lstm.bias_ih_l0", "stg2_low_band_net.0.lstm_dec2.lstm.bias_hh_l0", "stg2_low_band_net.0.lstm_dec2.lstm.weight_ih_l0_reverse", "stg2_low_band_net.0.lstm_dec2.lstm.weight_hh_l0_reverse", "stg2_low_band_net.0.lstm_dec2.lstm.bias_ih_l0_reverse", "stg2_low_band_net.0.lstm_dec2.lstm.bias_hh_l0_reverse", "stg2_low_band_net.0.lstm_dec2.dense.0.weight", "stg2_low_band_net.0.lstm_dec2.dense.0.bias", "stg2_low_band_net.0.lstm_dec2.dense.1.weight", "stg2_low_band_net.0.lstm_dec2.dense.1.bias", "stg2_low_band_net.0.lstm_dec2.dense.1.running_mean", "stg2_low_band_net.0.lstm_dec2.dense.1.running_var", "stg2_low_band_net.0.lstm_dec2.dense.1.num_batches_tracked", "stg2_low_band_net.0.dec1.conv1.conv.0.weight", "stg2_low_band_net.0.dec1.conv1.conv.1.weight", "stg2_low_band_net.0.dec1.conv1.conv.1.bias", "stg2_low_band_net.0.dec1.conv1.conv.1.running_mean", "stg2_low_band_net.0.dec1.conv1.conv.1.running_var", "stg2_low_band_net.0.dec1.conv1.conv.1.num_batches_tracked", "stg2_low_band_net.1.conv.0.weight", "stg2_low_band_net.1.conv.1.weight", "stg2_low_band_net.1.conv.1.bias", "stg2_low_band_net.1.conv.1.running_mean", "stg2_low_band_net.1.conv.1.running_var", "stg2_low_band_net.1.conv.1.num_batches_tracked", "stg2_high_band_net.enc1.conv.0.weight", "stg2_high_band_net.enc1.conv.1.weight", "stg2_high_band_net.enc1.conv.1.bias", "stg2_high_band_net.enc1.conv.1.running_mean", "stg2_high_band_net.enc1.conv.1.running_var", "stg2_high_band_net.enc1.conv.1.num_batches_tracked", "stg2_high_band_net.enc2.conv1.conv.0.weight", "stg2_high_band_net.enc2.conv1.conv.1.weight", "stg2_high_band_net.enc2.conv1.conv.1.bias", "stg2_high_band_net.enc2.conv1.conv.1.running_mean", "stg2_high_band_net.enc2.conv1.conv.1.running_var", "stg2_high_band_net.enc2.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.enc2.conv2.conv.0.weight", "stg2_high_band_net.enc2.conv2.conv.1.weight", "stg2_high_band_net.enc2.conv2.conv.1.bias", "stg2_high_band_net.enc2.conv2.conv.1.running_mean", "stg2_high_band_net.enc2.conv2.conv.1.running_var", "stg2_high_band_net.enc2.conv2.conv.1.num_batches_tracked", "stg2_high_band_net.enc3.conv1.conv.0.weight", "stg2_high_band_net.enc3.conv1.conv.1.weight", "stg2_high_band_net.enc3.conv1.conv.1.bias", "stg2_high_band_net.enc3.conv1.conv.1.running_mean", "stg2_high_band_net.enc3.conv1.conv.1.running_var", "stg2_high_band_net.enc3.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.enc3.conv2.conv.0.weight", "stg2_high_band_net.enc3.conv2.conv.1.weight", "stg2_high_band_net.enc3.conv2.conv.1.bias", "stg2_high_band_net.enc3.conv2.conv.1.running_mean", "stg2_high_band_net.enc3.conv2.conv.1.running_var", "stg2_high_band_net.enc3.conv2.conv.1.num_batches_tracked", "stg2_high_band_net.enc4.conv1.conv.0.weight", "stg2_high_band_net.enc4.conv1.conv.1.weight", "stg2_high_band_net.enc4.conv1.conv.1.bias", "stg2_high_band_net.enc4.conv1.conv.1.running_mean", "stg2_high_band_net.enc4.conv1.conv.1.running_var", "stg2_high_band_net.enc4.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.enc4.conv2.conv.0.weight", "stg2_high_band_net.enc4.conv2.conv.1.weight", "stg2_high_band_net.enc4.conv2.conv.1.bias", "stg2_high_band_net.enc4.conv2.conv.1.running_mean", "stg2_high_band_net.enc4.conv2.conv.1.running_var", "stg2_high_band_net.enc4.conv2.conv.1.num_batches_tracked", "stg2_high_band_net.enc5.conv1.conv.0.weight", "stg2_high_band_net.enc5.conv1.conv.1.weight", "stg2_high_band_net.enc5.conv1.conv.1.bias", "stg2_high_band_net.enc5.conv1.conv.1.running_mean", "stg2_high_band_net.enc5.conv1.conv.1.running_var", "stg2_high_band_net.enc5.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.enc5.conv2.conv.0.weight", "stg2_high_band_net.enc5.conv2.conv.1.weight", "stg2_high_band_net.enc5.conv2.conv.1.bias", "stg2_high_band_net.enc5.conv2.conv.1.running_mean", "stg2_high_band_net.enc5.conv2.conv.1.running_var", "stg2_high_band_net.enc5.conv2.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.conv1.1.conv.0.weight", "stg2_high_band_net.aspp.conv1.1.conv.1.weight", "stg2_high_band_net.aspp.conv1.1.conv.1.bias", "stg2_high_band_net.aspp.conv1.1.conv.1.running_mean", "stg2_high_band_net.aspp.conv1.1.conv.1.running_var", "stg2_high_band_net.aspp.conv1.1.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.conv2.conv.0.weight", "stg2_high_band_net.aspp.conv2.conv.1.weight", "stg2_high_band_net.aspp.conv2.conv.1.bias", "stg2_high_band_net.aspp.conv2.conv.1.running_mean", "stg2_high_band_net.aspp.conv2.conv.1.running_var", "stg2_high_band_net.aspp.conv2.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.conv3.conv.0.weight", "stg2_high_band_net.aspp.conv3.conv.1.weight", "stg2_high_band_net.aspp.conv3.conv.1.bias", "stg2_high_band_net.aspp.conv3.conv.1.running_mean", "stg2_high_band_net.aspp.conv3.conv.1.running_var", "stg2_high_band_net.aspp.conv3.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.conv4.conv.0.weight", "stg2_high_band_net.aspp.conv4.conv.1.weight", "stg2_high_band_net.aspp.conv4.conv.1.bias", "stg2_high_band_net.aspp.conv4.conv.1.running_mean", "stg2_high_band_net.aspp.conv4.conv.1.running_var", "stg2_high_band_net.aspp.conv4.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.conv5.conv.0.weight", "stg2_high_band_net.aspp.conv5.conv.1.weight", "stg2_high_band_net.aspp.conv5.conv.1.bias", "stg2_high_band_net.aspp.conv5.conv.1.running_mean", "stg2_high_band_net.aspp.conv5.conv.1.running_var", "stg2_high_band_net.aspp.conv5.conv.1.num_batches_tracked", "stg2_high_band_net.aspp.bottleneck.conv.0.weight", "stg2_high_band_net.aspp.bottleneck.conv.1.weight", "stg2_high_band_net.aspp.bottleneck.conv.1.bias", "stg2_high_band_net.aspp.bottleneck.conv.1.running_mean", "stg2_high_band_net.aspp.bottleneck.conv.1.running_var", "stg2_high_band_net.aspp.bottleneck.conv.1.num_batches_tracked", "stg2_high_band_net.dec4.conv1.conv.0.weight", "stg2_high_band_net.dec4.conv1.conv.1.weight", "stg2_high_band_net.dec4.conv1.conv.1.bias", "stg2_high_band_net.dec4.conv1.conv.1.running_mean", "stg2_high_band_net.dec4.conv1.conv.1.running_var", "stg2_high_band_net.dec4.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.dec3.conv1.conv.0.weight", "stg2_high_band_net.dec3.conv1.conv.1.weight", "stg2_high_band_net.dec3.conv1.conv.1.bias", "stg2_high_band_net.dec3.conv1.conv.1.running_mean", "stg2_high_band_net.dec3.conv1.conv.1.running_var", "stg2_high_band_net.dec3.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.dec2.conv1.conv.0.weight", "stg2_high_band_net.dec2.conv1.conv.1.weight", "stg2_high_band_net.dec2.conv1.conv.1.bias", "stg2_high_band_net.dec2.conv1.conv.1.running_mean", "stg2_high_band_net.dec2.conv1.conv.1.running_var", "stg2_high_band_net.dec2.conv1.conv.1.num_batches_tracked", "stg2_high_band_net.lstm_dec2.conv.conv.0.weight", "stg2_high_band_net.lstm_dec2.conv.conv.1.weight", "stg2_high_band_net.lstm_dec2.conv.conv.1.bias", "stg2_high_band_net.lstm_dec2.conv.conv.1.running_mean", "stg2_high_band_net.lstm_dec2.conv.conv.1.running_var", "stg2_high_band_net.lstm_dec2.conv.conv.1.num_batches_tracked", "stg2_high_band_net.lstm_dec2.lstm.weight_ih_l0", "stg2_high_band_net.lstm_dec2.lstm.weight_hh_l0", "stg2_high_band_net.lstm_dec2.lstm.bias_ih_l0", "stg2_high_band_net.lstm_dec2.lstm.bias_hh_l0", "stg2_high_band_net.lstm_dec2.lstm.weight_ih_l0_reverse", "stg2_high_band_net.lstm_dec2.lstm.weight_hh_l0_reverse", "stg2_high_band_net.lstm_dec2.lstm.bias_ih_l0_reverse", "stg2_high_band_net.lstm_dec2.lstm.bias_hh_l0_reverse", "stg2_high_band_net.lstm_dec2.dense.0.weight", "stg2_high_band_net.lstm_dec2.dense.0.bias", "stg2_high_band_net.lstm_dec2.dense.1.weight", "stg2_high_band_net.lstm_dec2.dense.1.bias", "stg2_high_band_net.lstm_dec2.dense.1.running_mean", "stg2_high_band_net.lstm_dec2.dense.1.running_var", "stg2_high_band_net.lstm_dec2.dense.1.num_batches_tracked", "stg2_high_band_net.dec1.conv1.conv.0.weight", "stg2_high_band_net.dec1.conv1.conv.1.weight", "stg2_high_band_net.dec1.conv1.conv.1.bias", "stg2_high_band_net.dec1.conv1.conv.1.running_mean", "stg2_high_band_net.dec1.conv1.conv.1.running_var", "stg2_high_band_net.dec1.conv1.conv.1.num_batches_tracked", "aux_out.weight", "stg1_low_band_net.0.enc1.conv.0.weight", "stg1_low_band_net.0.enc1.conv.1.weight", "stg1_low_band_net.0.enc1.conv.1.bias", "stg1_low_band_net.0.enc1.conv.1.running_mean", "stg1_low_band_net.0.enc1.conv.1.running_var", "stg1_low_band_net.0.enc1.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc2.conv1.conv.0.weight", "stg1_low_band_net.0.enc2.conv1.conv.1.weight", "stg1_low_band_net.0.enc2.conv1.conv.1.bias", "stg1_low_band_net.0.enc2.conv1.conv.1.running_mean", "stg1_low_band_net.0.enc2.conv1.conv.1.running_var", "stg1_low_band_net.0.enc2.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc2.conv2.conv.0.weight", "stg1_low_band_net.0.enc2.conv2.conv.1.weight", "stg1_low_band_net.0.enc2.conv2.conv.1.bias", "stg1_low_band_net.0.enc2.conv2.conv.1.running_mean", "stg1_low_band_net.0.enc2.conv2.conv.1.running_var", "stg1_low_band_net.0.enc2.conv2.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc3.conv1.conv.0.weight", "stg1_low_band_net.0.enc3.conv1.conv.1.weight", "stg1_low_band_net.0.enc3.conv1.conv.1.bias", "stg1_low_band_net.0.enc3.conv1.conv.1.running_mean", "stg1_low_band_net.0.enc3.conv1.conv.1.running_var", "stg1_low_band_net.0.enc3.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc3.conv2.conv.0.weight", "stg1_low_band_net.0.enc3.conv2.conv.1.weight", "stg1_low_band_net.0.enc3.conv2.conv.1.bias", "stg1_low_band_net.0.enc3.conv2.conv.1.running_mean", "stg1_low_band_net.0.enc3.conv2.conv.1.running_var", "stg1_low_band_net.0.enc3.conv2.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc4.conv1.conv.0.weight", "stg1_low_band_net.0.enc4.conv1.conv.1.weight", "stg1_low_band_net.0.enc4.conv1.conv.1.bias", "stg1_low_band_net.0.enc4.conv1.conv.1.running_mean", "stg1_low_band_net.0.enc4.conv1.conv.1.running_var", "stg1_low_band_net.0.enc4.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc4.conv2.conv.0.weight", "stg1_low_band_net.0.enc4.conv2.conv.1.weight", "stg1_low_band_net.0.enc4.conv2.conv.1.bias", "stg1_low_band_net.0.enc4.conv2.conv.1.running_mean", "stg1_low_band_net.0.enc4.conv2.conv.1.running_var", "stg1_low_band_net.0.enc4.conv2.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc5.conv1.conv.0.weight", "stg1_low_band_net.0.enc5.conv1.conv.1.weight", "stg1_low_band_net.0.enc5.conv1.conv.1.bias", "stg1_low_band_net.0.enc5.conv1.conv.1.running_mean", "stg1_low_band_net.0.enc5.conv1.conv.1.running_var", "stg1_low_band_net.0.enc5.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.enc5.conv2.conv.0.weight", "stg1_low_band_net.0.enc5.conv2.conv.1.weight", "stg1_low_band_net.0.enc5.conv2.conv.1.bias", "stg1_low_band_net.0.enc5.conv2.conv.1.running_mean", "stg1_low_band_net.0.enc5.conv2.conv.1.running_var", "stg1_low_band_net.0.enc5.conv2.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.conv1.1.conv.0.weight", "stg1_low_band_net.0.aspp.conv1.1.conv.1.weight", "stg1_low_band_net.0.aspp.conv1.1.conv.1.bias", "stg1_low_band_net.0.aspp.conv1.1.conv.1.running_mean", "stg1_low_band_net.0.aspp.conv1.1.conv.1.running_var", "stg1_low_band_net.0.aspp.conv1.1.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.conv2.conv.0.weight", "stg1_low_band_net.0.aspp.conv2.conv.1.weight", "stg1_low_band_net.0.aspp.conv2.conv.1.bias", "stg1_low_band_net.0.aspp.conv2.conv.1.running_mean", "stg1_low_band_net.0.aspp.conv2.conv.1.running_var", "stg1_low_band_net.0.aspp.conv2.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.conv3.conv.0.weight", "stg1_low_band_net.0.aspp.conv3.conv.1.weight", "stg1_low_band_net.0.aspp.conv3.conv.1.bias", "stg1_low_band_net.0.aspp.conv3.conv.1.running_mean", "stg1_low_band_net.0.aspp.conv3.conv.1.running_var", "stg1_low_band_net.0.aspp.conv3.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.conv4.conv.0.weight", "stg1_low_band_net.0.aspp.conv4.conv.1.weight", "stg1_low_band_net.0.aspp.conv4.conv.1.bias", "stg1_low_band_net.0.aspp.conv4.conv.1.running_mean", "stg1_low_band_net.0.aspp.conv4.conv.1.running_var", "stg1_low_band_net.0.aspp.conv4.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.conv5.conv.0.weight", "stg1_low_band_net.0.aspp.conv5.conv.1.weight", "stg1_low_band_net.0.aspp.conv5.conv.1.bias", "stg1_low_band_net.0.aspp.conv5.conv.1.running_mean", "stg1_low_band_net.0.aspp.conv5.conv.1.running_var", "stg1_low_band_net.0.aspp.conv5.conv.1.num_batches_tracked", "stg1_low_band_net.0.aspp.bottleneck.conv.0.weight", "stg1_low_band_net.0.aspp.bottleneck.conv.1.weight", "stg1_low_band_net.0.aspp.bottleneck.conv.1.bias", "stg1_low_band_net.0.aspp.bottleneck.conv.1.running_mean", "stg1_low_band_net.0.aspp.bottleneck.conv.1.running_var", "stg1_low_band_net.0.aspp.bottleneck.conv.1.num_batches_tracked", "stg1_low_band_net.0.dec4.conv1.conv.0.weight", "stg1_low_band_net.0.dec4.conv1.conv.1.weight", "stg1_low_band_net.0.dec4.conv1.conv.1.bias", "stg1_low_band_net.0.dec4.conv1.conv.1.running_mean", "stg1_low_band_net.0.dec4.conv1.conv.1.running_var", "stg1_low_band_net.0.dec4.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.dec3.conv1.conv.0.weight", "stg1_low_band_net.0.dec3.conv1.conv.1.weight", "stg1_low_band_net.0.dec3.conv1.conv.1.bias", "stg1_low_band_net.0.dec3.conv1.conv.1.running_mean", "stg1_low_band_net.0.dec3.conv1.conv.1.running_var", "stg1_low_band_net.0.dec3.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.dec2.conv1.conv.0.weight", "stg1_low_band_net.0.dec2.conv1.conv.1.weight", "stg1_low_band_net.0.dec2.conv1.conv.1.bias", "stg1_low_band_net.0.dec2.conv1.conv.1.running_mean", "stg1_low_band_net.0.dec2.conv1.conv.1.running_var", "stg1_low_band_net.0.dec2.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.0.lstm_dec2.conv.conv.0.weight", "stg1_low_band_net.0.lstm_dec2.conv.conv.1.weight", "stg1_low_band_net.0.lstm_dec2.conv.conv.1.bias", "stg1_low_band_net.0.lstm_dec2.conv.conv.1.running_mean", "stg1_low_band_net.0.lstm_dec2.conv.conv.1.running_var", "stg1_low_band_net.0.lstm_dec2.conv.conv.1.num_batches_tracked", "stg1_low_band_net.0.lstm_dec2.lstm.weight_ih_l0", "stg1_low_band_net.0.lstm_dec2.lstm.weight_hh_l0", "stg1_low_band_net.0.lstm_dec2.lstm.bias_ih_l0", "stg1_low_band_net.0.lstm_dec2.lstm.bias_hh_l0", "stg1_low_band_net.0.lstm_dec2.lstm.weight_ih_l0_reverse", "stg1_low_band_net.0.lstm_dec2.lstm.weight_hh_l0_reverse", "stg1_low_band_net.0.lstm_dec2.lstm.bias_ih_l0_reverse", "stg1_low_band_net.0.lstm_dec2.lstm.bias_hh_l0_reverse", "stg1_low_band_net.0.lstm_dec2.dense.0.weight", "stg1_low_band_net.0.lstm_dec2.dense.0.bias", "stg1_low_band_net.0.lstm_dec2.dense.1.weight", "stg1_low_band_net.0.lstm_dec2.dense.1.bias", "stg1_low_band_net.0.lstm_dec2.dense.1.running_mean", "stg1_low_band_net.0.lstm_dec2.dense.1.running_var", "stg1_low_band_net.0.lstm_dec2.dense.1.num_batches_tracked", "stg1_low_band_net.0.dec1.conv1.conv.0.weight", "stg1_low_band_net.0.dec1.conv1.conv.1.weight", "stg1_low_band_net.0.dec1.conv1.conv.1.bias", "stg1_low_band_net.0.dec1.conv1.conv.1.running_mean", "stg1_low_band_net.0.dec1.conv1.conv.1.running_var", "stg1_low_band_net.0.dec1.conv1.conv.1.num_batches_tracked", "stg1_low_band_net.1.conv.0.weight", "stg1_low_band_net.1.conv.1.weight", "stg1_low_band_net.1.conv.1.bias", "stg1_low_band_net.1.conv.1.running_mean", "stg1_low_band_net.1.conv.1.running_var", "stg1_low_band_net.1.conv.1.num_batches_tracked", "stg1_high_band_net.enc5.conv1.conv.0.weight", "stg1_high_band_net.enc5.conv1.conv.1.weight", "stg1_high_band_net.enc5.conv1.conv.1.bias", "stg1_high_band_net.enc5.conv1.conv.1.running_mean", "stg1_high_band_net.enc5.conv1.conv.1.running_var", "stg1_high_band_net.enc5.conv1.conv.1.num_batches_tracked", "stg1_high_band_net.enc5.conv2.conv.0.weight", "stg1_high_band_net.enc5.conv2.conv.1.weight", "stg1_high_band_net.enc5.conv2.conv.1.bias", "stg1_high_band_net.enc5.conv2.conv.1.running_mean", "stg1_high_band_net.enc5.conv2.conv.1.running_var", "stg1_high_band_net.enc5.conv2.conv.1.num_batches_tracked", "stg1_high_band_net.lstm_dec2.conv.conv.0.weight", "stg1_high_band_net.lstm_dec2.conv.conv.1.weight", "stg1_high_band_net.lstm_dec2.conv.conv.1.bias", "stg1_high_band_net.lstm_dec2.conv.conv.1.running_mean", "stg1_high_band_net.lstm_dec2.conv.conv.1.running_var", "stg1_high_band_net.lstm_dec2.conv.conv.1.num_batches_tracked", "stg1_high_band_net.lstm_dec2.lstm.weight_ih_l0", "stg1_high_band_net.lstm_dec2.lstm.weight_hh_l0", "stg1_high_band_net.lstm_dec2.lstm.bias_ih_l0", "stg1_high_band_net.lstm_dec2.lstm.bias_hh_l0", "stg1_high_band_net.lstm_dec2.lstm.weight_ih_l0_reverse", "stg1_high_band_net.lstm_dec2.lstm.weight_hh_l0_reverse", "stg1_high_band_net.lstm_dec2.lstm.bias_ih_l0_reverse", "stg1_high_band_net.lstm_dec2.lstm.bias_hh_l0_reverse", "stg1_high_band_net.lstm_dec2.dense.0.weight", "stg1_high_band_net.lstm_dec2.dense.0.bias", "stg1_high_band_net.lstm_dec2.dense.1.weight", "stg1_high_band_net.lstm_dec2.dense.1.bias", "stg1_high_band_net.lstm_dec2.dense.1.running_mean", "stg1_high_band_net.lstm_dec2.dense.1.running_var", "stg1_high_band_net.lstm_dec2.dense.1.num_batches_tracked", "stg1_high_band_net.enc1.conv.0.weight", "stg1_high_band_net.enc1.conv.1.weight", "stg1_high_band_net.enc1.conv.1.bias", "stg1_high_band_net.enc1.conv.1.running_mean", "stg1_high_band_net.enc1.conv.1.running_var", "stg1_high_band_net.enc1.conv.1.num_batches_tracked", "stg1_high_band_net.aspp.conv3.conv.1.bias", "stg1_high_band_net.aspp.conv3.conv.1.running_mean", "stg1_high_band_net.aspp.conv3.conv.1.running_var", "stg1_high_band_net.aspp.conv3.conv.1.num_batches_tracked", "stg1_high_band_net.aspp.conv4.conv.1.bias", "stg1_high_band_net.aspp.conv4.conv.1.running_mean", "stg1_high_band_net.aspp.conv4.conv.1.running_var", "stg1_high_band_net.aspp.conv4.conv.1.num_batches_tracked", "stg1_high_band_net.aspp.conv5.conv.1.bias", "stg1_high_band_net.aspp.conv5.conv.1.running_mean", "stg1_high_band_net.aspp.conv5.conv.1.running_var", "stg1_high_band_net.aspp.conv5.conv.1.num_batches_tracked", "stg1_high_band_net.aspp.bottleneck.conv.0.weight", "stg1_high_band_net.aspp.bottleneck.conv.1.weight", "stg1_high_band_net.aspp.bottleneck.conv.1.bias", "stg1_high_band_net.aspp.bottleneck.conv.1.running_mean", "stg1_high_band_net.aspp.bottleneck.conv.1.running_var", "stg1_high_band_net.aspp.bottleneck.conv.1.num_batches_tracked", "stg1_high_band_net.dec4.conv1.conv.0.weight", "stg1_high_band_net.dec4.conv1.conv.1.weight", "stg1_high_band_net.dec4.conv1.conv.1.bias", "stg1_high_band_net.dec4.conv1.conv.1.running_mean", "stg1_high_band_net.dec4.conv1.conv.1.running_var", "stg1_high_band_net.dec4.conv1.conv.1.num_batches_tracked", "stg1_high_band_net.dec3.conv1.conv.0.weight", "stg1_high_band_net.dec3.conv1.conv.1.weight", "stg1_high_band_net.dec3.conv1.conv.1.bias", "stg1_high_band_net.dec3.conv1.conv.1.running_mean", "stg1_high_band_net.dec3.conv1.conv.1.running_var", "stg1_high_band_net.dec3.conv1.conv.1.num_batches_tracked", "stg1_high_band_net.dec2.conv1.conv.0.weight", "stg1_high_band_net.dec2.conv1.conv.1.weight", "stg1_high_band_net.dec2.conv1.conv.1.bias", "stg1_high_band_net.dec2.conv1.conv.1.running_mean", "stg1_high_band_net.dec2.conv1.conv.1.running_var", "stg1_high_band_net.dec2.conv1.conv.1.num_batches_tracked", "stg1_high_band_net.dec1.conv1.conv.0.weight", "stg1_high_band_net.dec1.conv1.conv.1.weight", "stg1_high_band_net.dec1.conv1.conv.1.bias", "stg1_high_band_net.dec1.conv1.conv.1.running_mean", "stg1_high_band_net.dec1.conv1.conv.1.running_var", "stg1_high_band_net.dec1.conv1.conv.1.num_batches_tracked", "stg3_full_band_net.enc5.conv1.conv.0.weight", "stg3_full_band_net.enc5.conv1.conv.1.weight", "stg3_full_band_net.enc5.conv1.conv.1.bias", "stg3_full_band_net.enc5.conv1.conv.1.running_mean", "stg3_full_band_net.enc5.conv1.conv.1.running_var", "stg3_full_band_net.enc5.conv1.conv.1.num_batches_tracked", "stg3_full_band_net.enc5.conv2.conv.0.weight", "stg3_full_band_net.enc5.conv2.conv.1.weight", "stg3_full_band_net.enc5.conv2.conv.1.bias", "stg3_full_band_net.enc5.conv2.conv.1.running_mean", "stg3_full_band_net.enc5.conv2.conv.1.running_var", "stg3_full_band_net.enc5.conv2.conv.1.num_batches_tracked", "stg3_full_band_net.lstm_dec2.conv.conv.0.weight", "stg3_full_band_net.lstm_dec2.conv.conv.1.weight", "stg3_full_band_net.lstm_dec2.conv.conv.1.bias", "stg3_full_band_net.lstm_dec2.conv.conv.1.running_mean", "stg3_full_band_net.lstm_dec2.conv.conv.1.running_var", "stg3_full_band_net.lstm_dec2.conv.conv.1.num_batches_tracked", "stg3_full_band_net.lstm_dec2.lstm.weight_ih_l0", "stg3_full_band_net.lstm_dec2.lstm.weight_hh_l0", "stg3_full_band_net.lstm_dec2.lstm.bias_ih_l0", "stg3_full_band_net.lstm_dec2.lstm.bias_hh_l0", "stg3_full_band_net.lstm_dec2.lstm.weight_ih_l0_reverse", "stg3_full_band_net.lstm_dec2.lstm.weight_hh_l0_reverse", "stg3_full_band_net.lstm_dec2.lstm.bias_ih_l0_reverse", "stg3_full_band_net.lstm_dec2.lstm.bias_hh_l0_reverse", "stg3_full_band_net.lstm_dec2.dense.0.weight", "stg3_full_band_net.lstm_dec2.dense.0.bias", "stg3_full_band_net.lstm_dec2.dense.1.weight", "stg3_full_band_net.lstm_dec2.dense.1.bias", "stg3_full_band_net.lstm_dec2.dense.1.running_mean", "stg3_full_band_net.lstm_dec2.dense.1.running_var", "stg3_full_band_net.lstm_dec2.dense.1.num_batches_tracked", "stg3_full_band_net.enc1.conv.0.weight", "stg3_full_band_net.enc1.conv.1.weight", "stg3_full_band_net.enc1.conv.1.bias", "stg3_full_band_net.enc1.conv.1.running_mean", "stg3_full_band_net.enc1.conv.1.running_var", "stg3_full_band_net.enc1.conv.1.num_batches_tracked", "stg3_full_band_net.aspp.conv3.conv.1.bias", "stg3_full_band_net.aspp.conv3.conv.1.running_mean", "stg3_full_band_net.aspp.conv3.conv.1.running_var", "stg3_full_band_net.aspp.conv3.conv.1.num_batches_tracked", "stg3_full_band_net.aspp.conv4.conv.1.bias", "stg3_full_band_net.aspp.conv4.conv.1.running_mean", "stg3_full_band_net.aspp.conv4.conv.1.running_var", "stg3_full_band_net.aspp.conv4.conv.1.num_batches_tracked", "stg3_full_band_net.aspp.conv5.conv.1.bias", "stg3_full_band_net.aspp.conv5.conv.1.running_mean", "stg3_full_band_net.aspp.conv5.conv.1.running_var", "stg3_full_band_net.aspp.conv5.conv.1.num_batches_tracked", "stg3_full_band_net.aspp.bottleneck.conv.0.weight", "stg3_full_band_net.aspp.bottleneck.conv.1.weight", "stg3_full_band_net.aspp.bottleneck.conv.1.bias", "stg3_full_band_net.aspp.bottleneck.conv.1.running_mean", "stg3_full_band_net.aspp.bottleneck.conv.1.running_var", "stg3_full_band_net.aspp.bottleneck.conv.1.num_batches_tracked", "stg3_full_band_net.dec4.conv1.conv.0.weight", "stg3_full_band_net.dec4.conv1.conv.1.weight", "stg3_full_band_net.dec4.conv1.conv.1.bias", "stg3_full_band_net.dec4.conv1.conv.1.running_mean", "stg3_full_band_net.dec4.conv1.conv.1.running_var", "stg3_full_band_net.dec4.conv1.conv.1.num_batches_tracked", "stg3_full_band_net.dec3.conv1.conv.0.weight", "stg3_full_band_net.dec3.conv1.conv.1.weight", "stg3_full_band_net.dec3.conv1.conv.1.bias", "stg3_full_band_net.dec3.conv1.conv.1.running_mean", "stg3_full_band_net.dec3.conv1.conv.1.running_var", "stg3_full_band_net.dec3.conv1.conv.1.num_batches_tracked", "stg3_full_band_net.dec2.conv1.conv.0.weight", "stg3_full_band_net.dec2.conv1.conv.1.weight", "stg3_full_band_net.dec2.conv1.conv.1.bias", "stg3_full_band_net.dec2.conv1.conv.1.running_mean", "stg3_full_band_net.dec2.conv1.conv.1.running_var", "stg3_full_band_net.dec2.conv1.conv.1.num_batches_tracked", "stg3_full_band_net.dec1.conv1.conv.0.weight", "stg3_full_band_net.dec1.conv1.conv.1.weight", "stg3_full_band_net.dec1.conv1.conv.1.bias", "stg3_full_band_net.dec1.conv1.conv.1.running_mean", "stg3_full_band_net.dec1.conv1.conv.1.running_var", "stg3_full_band_net.dec1.conv1.conv.1.num_batches_tracked".
        size mismatch for stg1_high_band_net.enc2.conv1.conv.0.weight: copying a param with shape torch.Size([24, 12, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 32, 3, 3]).
        size mismatch for stg1_high_band_net.enc2.conv1.conv.1.weight: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for stg1_high_band_net.enc2.conv1.conv.1.bias: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for stg1_high_band_net.enc2.conv1.conv.1.running_mean: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for stg1_high_band_net.enc2.conv1.conv.1.running_var: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for stg1_high_band_net.enc2.conv2.conv.0.weight: copying a param with shape torch.Size([24, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
        size mismatch for stg1_high_band_net.enc2.conv2.conv.1.weight: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for stg1_high_band_net.enc2.conv2.conv.1.bias: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for stg1_high_band_net.enc2.conv2.conv.1.running_mean: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for stg1_high_band_net.enc2.conv2.conv.1.running_var: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for stg1_high_band_net.enc3.conv1.conv.0.weight: copying a param with shape torch.Size([48, 24, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
        size mismatch for stg1_high_band_net.enc3.conv1.conv.1.weight: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg1_high_band_net.enc3.conv1.conv.1.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg1_high_band_net.enc3.conv1.conv.1.running_mean: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg1_high_band_net.enc3.conv1.conv.1.running_var: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg1_high_band_net.enc3.conv2.conv.0.weight: copying a param with shape torch.Size([48, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
        size mismatch for stg1_high_band_net.enc3.conv2.conv.1.weight: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg1_high_band_net.enc3.conv2.conv.1.bias: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg1_high_band_net.enc3.conv2.conv.1.running_mean: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg1_high_band_net.enc3.conv2.conv.1.running_var: copying a param with shape torch.Size([48]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg1_high_band_net.enc4.conv1.conv.0.weight: copying a param with shape torch.Size([72, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
        size mismatch for stg1_high_band_net.enc4.conv1.conv.1.weight: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.enc4.conv1.conv.1.bias: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.enc4.conv1.conv.1.running_mean: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.enc4.conv1.conv.1.running_var: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.enc4.conv2.conv.0.weight: copying a param with shape torch.Size([72, 72, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for stg1_high_band_net.enc4.conv2.conv.1.weight: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.enc4.conv2.conv.1.bias: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.enc4.conv2.conv.1.running_mean: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.enc4.conv2.conv.1.running_var: copying a param with shape torch.Size([72]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.aspp.conv1.1.conv.0.weight: copying a param with shape torch.Size([96, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
        size mismatch for stg1_high_band_net.aspp.conv1.1.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.aspp.conv1.1.conv.1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.aspp.conv1.1.conv.1.running_mean: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.aspp.conv1.1.conv.1.running_var: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.aspp.conv2.conv.0.weight: copying a param with shape torch.Size([96, 96, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
        size mismatch for stg1_high_band_net.aspp.conv2.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.aspp.conv2.conv.1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.aspp.conv2.conv.1.running_mean: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.aspp.conv2.conv.1.running_var: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg1_high_band_net.aspp.conv3.conv.0.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1, 3, 3]).
        size mismatch for stg1_high_band_net.aspp.conv3.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
        size mismatch for stg1_high_band_net.aspp.conv4.conv.0.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1, 3, 3]).
        size mismatch for stg1_high_band_net.aspp.conv4.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
        size mismatch for stg1_high_band_net.aspp.conv5.conv.0.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1, 3, 3]).
        size mismatch for stg1_high_band_net.aspp.conv5.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([256, 256, 1, 1]).
        size mismatch for stg3_full_band_net.enc2.conv1.conv.0.weight: copying a param with shape torch.Size([96, 48, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
        size mismatch for stg3_full_band_net.enc2.conv1.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg3_full_band_net.enc2.conv1.conv.1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg3_full_band_net.enc2.conv1.conv.1.running_mean: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg3_full_band_net.enc2.conv1.conv.1.running_var: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg3_full_band_net.enc2.conv2.conv.0.weight: copying a param with shape torch.Size([96, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
        size mismatch for stg3_full_band_net.enc2.conv2.conv.1.weight: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg3_full_band_net.enc2.conv2.conv.1.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg3_full_band_net.enc2.conv2.conv.1.running_mean: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg3_full_band_net.enc2.conv2.conv.1.running_var: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for stg3_full_band_net.enc3.conv1.conv.0.weight: copying a param with shape torch.Size([192, 96, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
        size mismatch for stg3_full_band_net.enc3.conv1.conv.1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg3_full_band_net.enc3.conv1.conv.1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg3_full_band_net.enc3.conv1.conv.1.running_mean: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg3_full_band_net.enc3.conv1.conv.1.running_var: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg3_full_band_net.enc3.conv2.conv.0.weight: copying a param with shape torch.Size([192, 192, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for stg3_full_band_net.enc3.conv2.conv.1.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg3_full_band_net.enc3.conv2.conv.1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg3_full_band_net.enc3.conv2.conv.1.running_mean: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg3_full_band_net.enc3.conv2.conv.1.running_var: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for stg3_full_band_net.enc4.conv1.conv.0.weight: copying a param with shape torch.Size([288, 192, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
        size mismatch for stg3_full_band_net.enc4.conv1.conv.1.weight: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.enc4.conv1.conv.1.bias: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.enc4.conv1.conv.1.running_mean: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.enc4.conv1.conv.1.running_var: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.enc4.conv2.conv.0.weight: copying a param with shape torch.Size([288, 288, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for stg3_full_band_net.enc4.conv2.conv.1.weight: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.enc4.conv2.conv.1.bias: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.enc4.conv2.conv.1.running_mean: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.enc4.conv2.conv.1.running_var: copying a param with shape torch.Size([288]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.aspp.conv1.1.conv.0.weight: copying a param with shape torch.Size([384, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
        size mismatch for stg3_full_band_net.aspp.conv1.1.conv.1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.aspp.conv1.1.conv.1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.aspp.conv1.1.conv.1.running_mean: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.aspp.conv1.1.conv.1.running_var: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.aspp.conv2.conv.0.weight: copying a param with shape torch.Size([384, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
        size mismatch for stg3_full_band_net.aspp.conv2.conv.1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.aspp.conv2.conv.1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.aspp.conv2.conv.1.running_mean: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.aspp.conv2.conv.1.running_var: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for stg3_full_band_net.aspp.conv3.conv.0.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1, 3, 3]).
        size mismatch for stg3_full_band_net.aspp.conv3.conv.1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
        size mismatch for stg3_full_band_net.aspp.conv4.conv.0.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1, 3, 3]).
        size mismatch for stg3_full_band_net.aspp.conv4.conv.1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
        size mismatch for stg3_full_band_net.aspp.conv5.conv.0.weight: copying a param with shape torch.Size([384, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1, 3, 3]).
        size mismatch for stg3_full_band_net.aspp.conv5.conv.1.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
        size mismatch for out.weight: copying a param with shape torch.Size([2, 48, 1, 1]) from checkpoint, the shape in current model is torch.Size([2, 64, 1, 1]).

Here is my code:

import os
import pydub

cwd = os.getcwd()
ffmpeg_exec = cwd + "\\ffmpeg.exe" # or any other path to ffmpeg, as long as it is absolute and not relative.

pydub.AudioSegment.converter = ffmpeg_exec

from dotenv import load_dotenv

from rvc.modules.uvr5.modules import UVR

load_dotenv(".env")

print("Loading UVR")
uvr = UVR()

print("Extracting vocals...")

os.chdir(cwd + "\\Lib\\site-packages")

# downloaded model from:
# https://github.com/TRvlvr/model_repo/releases/

generator = uvr.uvr_wrapper(
    model_name="2_HP-UVR.pth",
    audio_path=cwd + "\\audio.wav",
    save_vocal_path=cwd + "\\vocal",
    save_ins_path=cwd + "\\inst",
    agg=5,
    export_format="wav",
    temp_path=cwd + "\\tmp")

for item in generator:
    print(item)

voc_file = cwd + "\\inst\\vocal_audio.wav_5.wav"

generator = uvr.uvr_wrapper(
    model_name="5_HP-Karaoke-UVR.pth",
    audio_path=voc_file,
    save_vocal_path=cwd + "\\main",
    save_ins_path=cwd + "\\other",
    agg=5,
    export_format="wav",
    temp_path=cwd + "\\tmp")

for item in generator:
    print(item)

main_voc_file = cwd + "\\other\\vocal_vocal_audio.wav_5.wav_5.wav"

generator = uvr.uvr_wrapper(
    model_name="UVR-De-Echo-Aggressive.pth",
    audio_path=main_voc_file,
    save_vocal_path=cwd + "\\noecho",
    save_ins_path=cwd + "\\echo",
    agg=5,
    export_format="wav",
    temp_path=cwd + "\\tmp")

for item in generator:
    print(item)

Note that both 2_HP-UVR.pth and 5-HP-Karaoke-UVR.pth work just fine.

I also ended up trying out the VR-DeEchoNormal.pth from my own RVC WebUI install and ended up with another error:

Traceback (most recent call last):
  File "C:\Users\jeje9\Desktop\rvc_test\rvc_test.py", line 62, in <module>
    for item in generator:
  File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\modules.py", line 85, in uvr_wrapper
    pre_fun._path_audio_(
  File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\vr.py", line 239, in _path_audio_
    ) = librosa.core.load(
TypeError: load() takes 1 positional argument but 3 positional arguments (and 2 keyword-only arguments) were given

code:

generator = uvr.uvr_wrapper(
    model_name="VR-DeEchoNormal.pth",
    audio_path=main_voc_file,
    save_vocal_path=cwd + "\\noecho",
    save_ins_path=cwd + "\\echo",
    agg=5,
    export_format="wav",
    temp_path=cwd + "\\tmp")

for item in generator:
    print(item)

UVR not working: same output as the input and freezes on using CPU

Hello again! I have one issue with UVR.
My code is:

from pathlib import Path
from scipy.io import wavfile
from rvc.modules.uvr5.vr import AudioPreprocess
import os
import sys
import platform

currentCWD = os.getcwd()
path = sys.prefix
system = platform.system().lower()

if (system == "windows"):
    path = path + "\\Lib\\site-packages"
else:
    pythonVersion = sys.version_info
    path = path + "/lib/python" + str(pythonVersion[0]) + "." + str(pythonVersion[1]) + "/site-packages"

os.chdir(path)

os.environ["TEMP"] = currentCWD
os.environ["weight_uvr5_root"] = currentCWD + "/uvr_assets"

model_path: str = "9_HP2-UVR.pth"
audio_path: str = "audio.wav"
agg: int = 10
uvr: AudioPreprocess = AudioPreprocess(model_path, agg, False)

uvr.config.use_cuda()
uvr.model.to("cuda")

print("Model loaded!")

inst, vocals, sr, _ = uvr.process(music_file = currentCWD + "/" + audio_path)
os.chdir(currentCWD)

wavfile.write("vocals.wav", sr, vocals)
wavfile.write("inst.wav", sr, inst)

print("Done!")

The output is:

Model loaded!
  0%|                                                    | 0/19 [00:00<?, ?it/s]/home/alcoft/Projects/Tests_I4.0/LibI4/Python_AI/.env/lib/python3.12/site-packages/torch/nn/modules/conv.py:456: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
  return F.conv2d(input, weight, bias, self.stride,
100%|███████████████████████████████████████████| 19/19 [00:01<00:00, 10.89it/s]
Done!

And the audio output (both vocal and instrumental) are the same as the input.

Running python -c "import torch; print(torch.backends.cudnn.is_available())" prints True.
Also, when trying to use the CPU for the inference the code freezes here:

Model loaded!
  0%|                                                    | 0/19 [00:00<?, ?it/s]

When this happens, the code does not use my CPU at all. The code does not print any error message.

My CPU it's not a very good CPU, but it should be enough for inference.
My GPU is a NVIDIA RTX 3050 and my OS is Arch Linux.

I have cuda, cudnn and nvidia drivers installed on my OS.
My Python version is Python 3.12.3

The UVR model I'm using is 9_HP2-UVR.pth.

If the problem is related with the UVR model I'm using, please recommend one that works.

Outdated RVC

i got old version of RVC, where i can download the latest release

Will this library be the base repo?

Hello! Wondering if this library will end up being the code without any of the gui stuff or other optional features? I was considering a fork of the GUI library just to support inferencing. But if a smaller repo is coming here maybe I shouldn't, and I can contribute to this.

support for python ^3.8

Hi, is there any reason why it's restricted to python 3.11.2? Is there anything I could do so it could run in python 3.8, 3.9 or 3.10?

What does this mean?

Always got this message:

INFO:xx:Train Epoch: 478 [92%]
INFO:xx:[110200, 9.421142503636453e-05]
INFO:xx:loss_disc=3.598, loss_gen=3.174, loss_fm=8.765,loss_mel=19.367, loss_kl=1.520

Is this normal?

API speed slow vs CLI

I was able to get both the API and cli options working on a silicon macbook air m1 - very cool! However, the api seems exceptionally slow and asked for a lot of permissions for data and etc. (through vs code). Is this what you would expect? Inference with vocals on a single song took approx 2:30 with cli and 5:30 through the api. The API hangs on:

DEBUG:rvc.lib.infer_pack.models:gin_channels: 256, self.spk_embed_dim: 109

for over three minutes and 30 seconds before proceeding so I think that's where the time is lost. Is there anyway to speed this up?

Thank you in advance!

index_file implementation is currently broken

#25 & #14
Looks like RVC won't properly read an index_file.

/root/.cache/pypoetry/virtualenvs/rvc-9TtSrW0h-py3.10/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
2024-05-28 19:13:17 | INFO | rvc.modules.vc.modules | Select index:
index_file: /rvc_models/added_IVF632_Flat_nprobe_1_IvanaAlawi_v2.index
2024-05-28 19:13:18 | INFO | fairseq.tasks.hubert_pretraining | current directory is /app
2024-05-28 19:13:18 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2024-05-28 19:13:18 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
Traceback (most recent call last):
  File "/app/rvc/modules/vc/pipeline.py", line 307, in pipeline
    index = faiss.read_index(file_index)
  File "/root/.cache/pypoetry/virtualenvs/rvc-9TtSrW0h-py3.10/lib/python3.10/site-packages/faiss/swigfaiss_avx2.py", line 10538, in read_index
    return _swigfaiss_avx2.read_index(*args)
TypeError: Wrong number or type of arguments for overloaded function 'read_index'.
  Possible C/C++ prototypes are:
    faiss::read_index(char const *,int)
    faiss::read_index(char const *)
    faiss::read_index(FILE *,int)
    faiss::read_index(FILE *)
    faiss::read_index(faiss::IOReader *,int)
    faiss::read_index(faiss::IOReader *)

I can confirm the .index file is also in the correct location.

How can I use RVC on my CPU?

I have an NVIDIA graphics card, but I don't want to use the GPU, I just want to use my CPU.
When I run my code, RVC automatically runs on my GPU.

How can I change the device?

AttributeError: 'VC' object has no attribute 'vc_infer'.

Just Ctrl+C and Ctrl+V ed the code sample in README but it doesn't seem to be working

It turns out that the code in package is not the same as this repo.
This is the command I used to install he package:
pip install rvc

**Comparations Below: **
This is the function vc_infer is used to do the infering.

rvc github version

But in the package the code looks like this. (vc_single instead of vc_infer)

rvc package version

This raises an AttributeError for those who follow the guide in README as there is nothing called vc_infer there.

From Pydub: [FileNotFoundError: [WinError 2] The system cannot find the file specified] OR [RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work]

For developpers using the RVC library:

C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)

...

Traceback (most recent call last):
  File "C:\Users\jeje9\Desktop\rvc_test\rvc_test.py", line 35, in <module>
    for i in result:
  File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\modules.py", line 77, in uvr_wrapper
    AudioSegment.from_file(process_path).export(
  File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\pydub\audio_segment.py", line 963, in export
    p = subprocess.Popen(conversion_command, stdin=devnull, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "C:\Users\jeje9\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\jeje9\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1456, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

If you have this error then fear not, it can be fixed pretty easily:

  • Download a binary executable from https://ffmpeg.org/download.html
  • Add the executable to your project
  • Add these lines of code in your script (before any calls to the rvc libraries):
import os
import pydub

pydub.AudioSegment.converter = os.getcwd() + "\\ffmpeg.exe" # or any other path to ffmpeg, as long as it is absolute and not relative.
  • You're good to go.

FileNotFoundError: [Errno 2] No such file or directory: 'rvc/lib/uvr5_pack/lib_v5/modelparams/4band_v2.json' when using UVR.uvr_wrapper()

Hi, getting this error when using the UVR.uvr_wrapper() function:

Loading UVR
Extracting vocals...
Traceback (most recent call last):
  File "C:\Users\jeje9\Desktop\rvc_test\rvc_test.py", line 27, in <module>
    for i in result:
  File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\modules.py", line 49, in uvr_wrapper
    pre_fun = func(
  File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\modules\uvr5\vr.py", line 31, in __init__
    mp = ModelParameters("rvc/lib/uvr5_pack/lib_v5/modelparams/4band_v2.json")
  File "C:\Users\jeje9\Desktop\rvc_test\lib\site-packages\rvc\lib\uvr5_pack\lib_v5\model_param_init.py", line 55, in __init__
    with open(config_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'rvc/lib/uvr5_pack/lib_v5/modelparams/4band_v2.json'

Here is my code:

import os

from dotenv import load_dotenv
from rvc.modules.uvr5.modules import UVR

# downloaded uvr model from:
# https://github.com/TRvlvr/model_repo/releases/

cwd = os.getcwd()
load_dotenv(".env")

print("Loading UVR")
uvr = UVR()

print("Extracting vocals...")

result = uvr.uvr_wrapper(
    model_name="2_HP-UVR.pth",
    audio_path=cwd + "audio.wav",
    save_vocal_path=cwd + "vocal.wav",
    save_ins_path=cwd + "inst.wav",
    agg=10, 
    export_format="wav",
    temp_path=cwd + "tmp.wav")

for i in result:
    print(i)

I made sure to look at the path, and the file does exist, so I'm thinking it might be an issue with the expected CWD since the path is relative.
For more info: I'm running python from a venv and using this command to run: ./Scripts/python.exe rvc_test.py from the C:\Users\jeje9\Desktop\rvc_test directory where the rvc_test.py file is

ValueError: mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory

I have this error when trying to execute my code.

My code is:

from pathlib import Path
from dotenv import load_dotenv
from scipy.io import wavfile
from rvc.modules.vc.modules import VC
import os
import torch
import json
import ai_config as cfg

vc: VC = VC()

def __load_model__(model_path: str, device: str) -> None:
    vc.config.device = device
    vc.get_vc(model_path)

def LoadModel() -> None:
    if (not cfg.current_data.prompt_order.__contains__("rvc")):
        raise Exception("Model is not in 'prompt_order'.")

    if (vc != None or len(cfg.current_data.rvc_model_path.strip()) == 0):
        return

    device = "cuda" if (torch.cuda.is_available() and cfg.current_data.use_gpu_if_available and cfg.current_data.move_to_gpu.count("rvc") > 0) else "cpu"

    load_dotenv("rvc_env")
    __load_model__(cfg.current_data.rvc_model_path, device)

def __make_rvc__(audio_name: str | Path, protect: float = 0.33, filter_radius: int = 3, method: str = "rmvpe") -> bytes:
    LoadModel()

    if (type(audio_name) == str):
        audio_name = Path(audio_name)
    
    if (len(cfg.current_data.rvc_index_path.strip()) == 0):
        index_file = None
    else:
        index_file = Path(cfg.current_data.rvc_index_path)
    
    if (method != "rmvpe" and method != "pm" and method != "harvest" and method != "crepe"):
        raise Exception("RMV method must be 'rmvpe', 'pm', 'harvest' or 'crepe'.")

    tgt_sr, audio_opt, _, _ = vc.vc_single(sid = 0, input_audio_path = audio_name, f0_method = method, index_file = index_file, filter_radius = filter_radius, protect = protect)
    output_file = "tmp_rvc_audio_"
    output_file_id = 0

    while (os.path.exists(output_file + str(output_file_id) + ".wav")):
        output_file_id += 1

    wavfile.write(output_file + str(output_file_id) + ".wav", tgt_sr, audio_opt)
    audio_bytes = b""

    with open(output_file + str(output_file_id) + ".wav", "wb") as f:
        audio_bytes = f.read()
        f.close()
    
    os.remove(output_file + str(output_file_id) + ".wav")
    return audio_bytes

def MakeRVC(data: str | dict[str]) -> bytes:
    if (type(data) == str):
        try:
            data = json.loads(data)
        except Exception as ex:
            raise Exception("[RVC] Data must be a dictionary or a JSON code. ERROR: " + str(ex))
    
    ddata = {
        "input": "",
        "protect": 0.33,
        "filter_radius": 3,
        "method": "rmvpe"
    }

    try:
        ddata["input"] = data["input"]
    except:
        raise Exception("Unable to get audio path.")
    
    try:
        ddata["protect"] = float(data["protect"])
    except:
        pass

    try:
        ddata["filter_radius"] = int(data["filter_radius"])
    except:
        pass

    try:
        ddata["method"] = data["method"]
    except:
        pass

    return __make_rvc__(ddata["input"], ddata["protect"], ddata["filter_radius"], ddata["method"])

The traceback is:

Traceback (most recent call last):
  File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/ai_server_all.py", line 1, in <module>
    import ai_server
  File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/ai_server.py", line 9, in <module>
    import chatbot_all as cb
  File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/chatbot_all.py", line 14, in <module>
    import Inference.RVC_inference as rvc
  File "/home/alcoft/Projects/Multilang/TAO_I4.0/LibI4/Python_AI/Inference/RVC_inference.py", line 4, in <module>
    from rvc.modules.vc.modules import VC
  File "/home/alcoft/.local/lib/python3.11/site-packages/rvc/modules/vc/modules.py", line 21, in <module>
    from rvc.modules.vc.utils import *
  File "/home/alcoft/.local/lib/python3.11/site-packages/rvc/modules/vc/utils.py", line 3, in <module>
    from fairseq import checkpoint_utils
  File "/home/alcoft/.local/lib/python3.11/site-packages/fairseq/__init__.py", line 20, in <module>
    from fairseq.distributed import utils as distributed_utils
  File "/home/alcoft/.local/lib/python3.11/site-packages/fairseq/distributed/__init__.py", line 7, in <module>
    from .fully_sharded_data_parallel import (
  File "/home/alcoft/.local/lib/python3.11/site-packages/fairseq/distributed/fully_sharded_data_parallel.py", line 10, in <module>
    from fairseq.dataclass.configs import DistributedTrainingConfig
  File "/home/alcoft/.local/lib/python3.11/site-packages/fairseq/dataclass/__init__.py", line 6, in <module>
    from .configs import FairseqDataclass
  File "/home/alcoft/.local/lib/python3.11/site-packages/fairseq/dataclass/configs.py", line 1104, in <module>
    @dataclass
     ^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 1230, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 1220, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 958, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 815, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory

Can anyone help me fix this error?

Index file argument broken

When I set the index rate to 0.75 or 0.7 (I haven't tried with other values) this message appears.

Traceback (most recent call last):
  File "D:\Biblioteca\Documentos\RVC Project\Retrieval-based-Voice-Conversion-develop\rvc\modules\vc\pipeline.py", line 307, in pipeline
    index = faiss.read_index(file_index)
  File "C:\Users\Guilherme\AppData\Local\Programs\Python\Python310\lib\site-packages\faiss\swigfaiss_avx2.py", line 10409, in read_index
    return _swigfaiss_avx2.read_index(*args)
TypeError: Wrong number or type of arguments for overloaded function 'read_index'.
  Possible C/C++ prototypes are:
    faiss::read_index(char const *,int)
    faiss::read_index(char const *)
    faiss::read_index(FILE *,int)
    faiss::read_index(FILE *)
    faiss::read_index(faiss::IOReader *,int)
    faiss::read_index(faiss::IOReader *)

Api not releasing memory after inference

Hi there,

I believe I almost have this all figured out and it's working great. One issue I'm having is that after infering one time using the API, memory usage stays very high (12.7gb out of 16), even though no processing is happening. This is happening on a macbook air m1 16gb. Is there a way to force rvc to release that memory usage after each api call? Thanks in advance!

AttributeError: 'NoneType' object has no attribute 'dtype'

When I try to make an inference by outputting the file name, I get this error.

Traceback (most recent call last):
  File "/opt/conda/bin/rvc", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/rvc/utils/cli/cli.py", line 29, in main
    cli()
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/rvc/utils/cli/handler/infer.py", line 130, in infer
    wavfile.write(outputpath, tgt_sr, audio_opt)
  File "/opt/conda/lib/python3.10/site-packages/scipy/io/wavfile.py", line 771, in write
    dkind = data.dtype.kind
AttributeError: 'NoneType' object has no attribute 'dtype'

And when I put "-o" as a folder, it gives an error saying it is a directory
Sorry for the bad English, I used Google Translate

onnx infer problem

image

I have a problem regarding the onnx infer, I use the demo file in the tools directory (onnx_inference_demo.py), and I can't infer the data, could you help me to fix this problem.
Thank you so much.

some errors throw during infer, and output file generated is bad quality

MacBook Pro Intel i9 8-Core / AMD Radeon Pro 5300M / 32GB DDR4 RAM / macOS Sanoma 14.2
Python 3.10.13
Poetry 1.7.1
CLI command:

PYTORCH_ENABLE_MPS_FALLBACK=1  rvc infer -rmr 1 -p 0 -ir 0.75  -m weights/Peter/model.pth -if weights/Peter/index.index -i input.mp3 -o output1.mp3

command output:

NFO:rvc.configs.config:No supported Nvidia GPU found
INFO:rvc.configs.config:overwrite configs.json
INFO:rvc.configs.config:Use mps instead
INFO:rvc.configs.config:is_half:False, device:mps
UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
DEBUG:rvc.lib.infer_pack.models:gin_channels: 256, self.spk_embed_dim: 109
INFO:rvc.modules.vc.modules:Select index: 
INFO:fairseq.tasks.hubert_pretraining:current directory is /Retrieval-based-Voice-Conversion
INFO:fairseq.tasks.hubert_pretraining:HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
INFO:fairseq.models.hubert.hubert:HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'conv_pos_batch_norm': False, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
Traceback (most recent call last):
  File "/Retrieval-based-Voice-Conversion/rvc/modules/vc/pipeline.py", line 307, in pipeline
    index = faiss.read_index(file_index)
  File "/Retrieval-based-Voice-Conversion/.venv/lib/python3.10/site-packages/faiss/swigfaiss_avx2.py", line 9924, in read_index
    return _swigfaiss_avx2.read_index(*args)
TypeError: Wrong number or type of arguments for overloaded function 'read_index'.
  Possible C/C++ prototypes are:
    faiss::read_index(char const *,int)
    faiss::read_index(char const *)
    faiss::read_index(FILE *,int)
    faiss::read_index(FILE *)
    faiss::read_index(faiss::IOReader *,int)
    faiss::read_index(faiss::IOReader *)
    
    
    INFO:rvc.modules.vc.pipeline:Loading rmvpe model,assets/rmvpe/rmvpe.pt
/Retrieval-based-Voice-Conversion/.venv/lib/python3.10/site-packages/torch/functional.py:650: UserWarning: The operator 'aten::_fft_r2c' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/Retrieval-based-Voice-Conversion/rvc/lib/infer_pack/attentions.py:334: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:474.)
  x = F.pad(
{'npy': 6.011045217514038, 'f0': 135.6644949913025, 'infer': 27.060052633285522}
Finish inference. Check output1.mp3

Although I get an output file, the sound has lots of artefacts/noise and is not smooth at all. I see some warnings and errors in the console output, are they the cause? or is it the models I am using?

Also how to get the output combined with the instrumental when using music audio?

Thanks,

Solved: RVC being slower than the WebUI version

Hi, posting this here since I encountered the issue on my end and managed to solve it.

When installing RVC as a python package, using pip install git+https://github.com/RVC-Project/Retrieval-based-Voice-Conversion, the version of torch that was installed was the CPU version instead of the CUDA version, meaning the process was slowed by at least 10x.

To fix, re-install torch using pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 for NVIDIA GPUs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.