kabachuha / sd-webui-text2video Goto Github PK

Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies

License: Other

Python 99.77% CSS 0.12% Shell 0.06% JavaScript 0.05%

automatic1111 extension gradio modelscope stable-diffusion text2video webui videocrafter

sd-webui-text2video's Introduction

text2video Extension for AUTOMATIC1111's StableDiffusion WebUI

Warning: as of 2023-11-21 this extension is not maintained. If you'd like to continue devving/remaking it, please contact me on Discord @kabachuha (you can also find me on camenduru's server's text2video channel) and we'll figure it out

~~Maintained starting on 2023-11-21 by Deforum-art~~

Maintained by me again

Auto1111 extension implementing various text2video models, such as ModelScope and VideoCrafter, using only Auto1111 webui dependencies and downloadable models (so no logins required anywhere)

Requirements

ModelScope

6 GBs vram should be enough to run on GPU with low vram vae on at 256x256 (and we are already getting reports of people launching 192x192 videos with 4gbs of vram). 24 frames long 256x256 video definitely fits into 12gbs of NVIDIA GeForce RTX 2080 Ti, or if you have a Torch2 attention optimization supported videocard, you can fit the whopping 125 frames (8 seconds) long video into the same 12 GBs of VRAM! 250 frames (16 seconds) in the same conditions take 20 gbs.

Prompt: best quality, anime girl dancing

exampleUntitled.mp4

We will appreciate any help with this extension, especially pull-requests.

LoRA Support

Currently, there is support for trained LoRAs using this finetune repository. Please follow instructions there on how to train them. https://github.com/ExponentialML/Text-To-Video-Finetuning#updates

After training, simply place them into your default LoRA directory defined by your webui installation.

VideoCrafter (WIP, needs more devs to maintain properly as well)

VideoCrafter runs with around 9.2 GBs of VRAM with the settings set on Default.

Major changes between versions

Update 2023-03-27: VAE settings and "Keep model in VRAM" moved to general webui setting under 'ModelScopeTxt2Vid' section.

Update 2023-03-26: prompt weights implemented! (ModelScope only yet, as of 2023-04-05)

Update 2023-04-05: added VideoCrafter support, renamed the extension to plainly 'sd-webui-text2video'

Update 2023-04-13: in-framing/in-painting support: allows to 'animate' an existing pic or even seamlessly loop the videos!

Update 2023-04-15: MEGA-UPDATE: Torch2/xformers optimizations, possible to make 125 frames long video on 12 gbs of VRAM. CPU offloading doesn't happen now if keep_pipe_in_vram is checked.

Update 2023-04-16: WebAPI is available!

Update 2023-07-02: Alternate samplers, model hotswitch.

Test examples:

ModelScope

Prompt: cinematic explosion by greg rutkowski

vid.mp4

Prompt: really attractive anime girl skating, by makoto shinkai, cinematic lighting

gosh.mp4

'Continuing' an existing image

Prompt: best quality, astronaut dog

egUntitled.mp4

Prompt: explosion

expl.mp4

In-painting and looping back the videos

Prompt: nuclear explosion

galaxybrain.mp4

Prompt: best quality, lots of cheese

matcheeseUntitled.mp4

VideoCrafter

Prompt: anime 1girl reimu touhou

working.mp4

Where to get the weights

ModelScope

Download the following files from the original HuggingFace repository. Alternatively, download half-precision fp16 pruned weights (they are smaller and use less vram on loading):

VQGAN_autoencoder.pth
configuration.json
open_clip_pytorch_model.bin
text2video_pytorch_model.pth

And put them in stable-diffusion-webui/models/ModelScope/t2v. Create those 2 folders if they are missing.

VideoCrafter

Download pretrained T2V models either via this link or download the pruned half precision weights, and put the model.ckpt in models/VideoCrafter/model.ckpt.

Fine-tunes and how to use them

Thanks to https://github.com/ExponentialML/Text-To-Video-Finetuning you can fine-tune your models!

To utilize a fine-tuned model here, use this script which will convert the Diffusers-formatted model that repo outputs into the original weights format.

Prominent Fine-tunes

ZeroScope v2

Trained by @cerspense on high quality YouTube videos. Download the files from the folder named zs2_XL at cerspense/zeroscope_v2_XL and then add the missing VQGAN_autoencoder.pth and configuration.json from any other ModelScope model.

paradot.mp4

Potat1

Potat1 is a ModelScope-based model trained by @camenduru on 2197 clips with the resolution of 1024x576 which makes it the first open source hi-res text2video model.

vid.2.mp4

To download the plug-and-play weights for the extension use this link https://huggingface.co/kabachuha/potat1-with-text-encoder-original-format.

Animov-0.1

Animov-0.1 by strangeman3107. The converted weights for this model reside here.

w.mp4

Screenshots

txt2vid with img2vid

vid2vid

Dev resources

ModelScope

HuggingFace space:

https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis

The model PyTorch implementation from ModelScope:

https://github.com/modelscope/modelscope/tree/master/modelscope/models/multi_modal/video_synthesis

Google Colab from the devs:

https://colab.research.google.com/drive/1uW1ZqswkQ9Z9bp5Nbo5z59cAn7I0hE6R?usp=sharing

VideoCrafter

Github:

https://github.com/VideoCrafter/VideoCrafter

sd-webui-text2video's People

Contributors

Stargazers

Watchers

Forkers

lyhiving mhussar hithereai nagolinc hadesnull123 ymow issohsan xmyx rolyas rsashka wyhauyeung darth-veitcher seekhroll joegoldin sinvtech iwillcodeu jameshennessytempus jeonghogok harsh-dhillon mj-moneyforeal rocketgod-git leektlove superloreai youngzs ewave33 barleyj21 rbfussell mrm202 rekothelucario cluna80 phyougucaoph netux ken2190 andyoulovexy sorokinvld devgangsters crumpledstudio ligeng639 gaolt930724 radiance-ai bc96 landis007 thesloppiestofjoes commerceless anouar663 phi-line summithwangcn jeffara hubin858130 jags111 neilfranks jmaigc diontimmer exponentialml camenduru kp-forks myte1why top7 tinybright yumataesu chengood5000 1024bit dmarx ambocclusion dotsimulate kekewind rossman22590 drmweigand bfasenfest dvschultz myprivateclonelibrary edenbuaa hike2008 selvaprakash lwneal aspnetcs xdonedude mr-harry jaraim zhoulingjie boragocode zhaohongyu1131 prog-ape 5l1v3r1 renshixunzhen wf1024966 jyjodio xuweiyichen federalsafe987 ranabh221 kai0226 disappointmentinc steveefemsc intellibridgeaidev narana gacwr monsterhunters marceloclaro lcsouzamenezes vipernet33

sd-webui-text2video's Issues

ffmpeg: unhardcode

ffmpeg params are currently hardcoded, so Gradio elements are needed to be added for it as well

See https://github.com/deforum-art/deforum-for-automatic1111-webui/blob/automatic1111-webui/scripts/deforum_helpers/args.py as reference.

[Feature Request]: WebAPI

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Such feature definitely needs its own REST API for other communications to interact with it, so it would be useful as a part of video generating services, such as Discord bots

Proposed workflow

Make an app which is able to send REST API requests
Send a request
It's processed by the auto plugin
The result is sent back, or if it fails, an error message is sent instead

Additional information

No response

[Feature Request]: Add batch generation from file or a textbox

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Occasionally, visualizing an engaging scene can pose difficulties when producing individual videos for a large-scale project, as generating videos one at a time can be quite challenging. Please consider incorporating the "Prompts from File or Textbox" feature, which is located under the "Scripts" section in the default A1111 build.

Here is an example of its implementation:

import copy
import math
import os
import random
import sys
import traceback
import shlex

import modules.scripts as scripts
import gradio as gr

from modules import sd_samplers
from modules.processing import Processed, process_images
from PIL import Image
from modules.shared import opts, cmd_opts, state


def process_string_tag(tag):
    return tag


def process_int_tag(tag):
    return int(tag)


def process_float_tag(tag):
    return float(tag)


def process_boolean_tag(tag):
    return True if (tag == "true") else False


prompt_tags = {
    "sd_model": None,
    "outpath_samples": process_string_tag,
    "outpath_grids": process_string_tag,
    "prompt_for_display": process_string_tag,
    "prompt": process_string_tag,
    "negative_prompt": process_string_tag,
    "styles": process_string_tag,
    "seed": process_int_tag,
    "subseed_strength": process_float_tag,
    "subseed": process_int_tag,
    "seed_resize_from_h": process_int_tag,
    "seed_resize_from_w": process_int_tag,
    "sampler_index": process_int_tag,
    "sampler_name": process_string_tag,
    "batch_size": process_int_tag,
    "n_iter": process_int_tag,
    "steps": process_int_tag,
    "cfg_scale": process_float_tag,
    "width": process_int_tag,
    "height": process_int_tag,
    "restore_faces": process_boolean_tag,
    "tiling": process_boolean_tag,
    "do_not_save_samples": process_boolean_tag,
    "do_not_save_grid": process_boolean_tag
}


def cmdargs(line):
    args = shlex.split(line)
    pos = 0
    res = {}

    while pos < len(args):
        arg = args[pos]

        assert arg.startswith("--"), f'must start with "--": {arg}'
        assert pos+1 < len(args), f'missing argument for command line option {arg}'

        tag = arg[2:]

        if tag == "prompt" or tag == "negative_prompt":
            pos += 1
            prompt = args[pos]
            pos += 1
            while pos < len(args) and not args[pos].startswith("--"):
                prompt += " "
                prompt += args[pos]
                pos += 1
            res[tag] = prompt
            continue


        func = prompt_tags.get(tag, None)
        assert func, f'unknown commandline option: {arg}'

        val = args[pos+1]
        if tag == "sampler_name":
            val = sd_samplers.samplers_map.get(val.lower(), None)

        res[tag] = func(val)

        pos += 2

    return res


def load_prompt_file(file):
    if file is None:
        lines = []
    else:
        lines = [x.strip() for x in file.decode('utf8', errors='ignore').split("\n")]

    return None, "\n".join(lines), gr.update(lines=7)


class Script(scripts.Script):
    def title(self):
        return "Prompts from file or textbox"

    def ui(self, is_img2img):       
        checkbox_iterate = gr.Checkbox(label="Iterate seed every line", value=False, elem_id=self.elem_id("checkbox_iterate"))
        checkbox_iterate_batch = gr.Checkbox(label="Use same random seed for all lines", value=False, elem_id=self.elem_id("checkbox_iterate_batch"))

        prompt_txt = gr.Textbox(label="List of prompt inputs", lines=1, elem_id=self.elem_id("prompt_txt"))
        file = gr.File(label="Upload prompt inputs", type='binary', elem_id=self.elem_id("file"))

        file.change(fn=load_prompt_file, inputs=[file], outputs=[file, prompt_txt, prompt_txt])

        # We start at one line. When the text changes, we jump to seven lines, or two lines if no \n.
        # We don't shrink back to 1, because that causes the control to ignore [enter], and it may
        # be unclear to the user that shift-enter is needed.
        prompt_txt.change(lambda tb: gr.update(lines=7) if ("\n" in tb) else gr.update(lines=2), inputs=[prompt_txt], outputs=[prompt_txt])
        return [checkbox_iterate, checkbox_iterate_batch, prompt_txt]

    def run(self, p, checkbox_iterate, checkbox_iterate_batch, prompt_txt: str):
        lines = [x.strip() for x in prompt_txt.splitlines()]
        lines = [x for x in lines if len(x) > 0]

        p.do_not_save_grid = True

        job_count = 0
        jobs = []

        for line in lines:
            if "--" in line:
                try:
                    args = cmdargs(line)
                except Exception:
                    print(f"Error parsing line {line} as commandline:", file=sys.stderr)
                    print(traceback.format_exc(), file=sys.stderr)
                    args = {"prompt": line}
            else:
                args = {"prompt": line}

            job_count += args.get("n_iter", p.n_iter)

            jobs.append(args)

        print(f"Will process {len(lines)} lines in {job_count} jobs.")
        if (checkbox_iterate or checkbox_iterate_batch) and p.seed == -1:
            p.seed = int(random.randrange(4294967294))

        state.job_count = job_count

        images = []
        all_prompts = []
        infotexts = []
        for n, args in enumerate(jobs):
            state.job = f"{state.job_no + 1} out of {state.job_count}"

            copy_p = copy.copy(p)
            for k, v in args.items():
                setattr(copy_p, k, v)

            proc = process_images(copy_p)
            images += proc.images
            
            if checkbox_iterate:
                p.seed = p.seed + (p.batch_size * p.n_iter)
            all_prompts += proc.all_prompts
            infotexts += proc.infotexts

        return Processed(p, images, p.seed, "", all_prompts=all_prompts, infotexts=infotexts)

Proposed workflow

Go to the txt2vid tab
Press the checkbox "prompts from a textbox"
Main text2vid prompt field will now consider each new line as a command for batch prompt processing (one generation per line, or multiple if the user will use non-standard batch count).

Additional information

No response

[Bug]: Every video I generate has a shutterstock watermark?

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

Not sure why this is happening. Followed the install instructions, have all the models, and I'm using the standard SD 1.5 model (though I have tried others). For some reason no matter what I do, everything I generate has a shutterstock watermark.

vid.mp4

Steps to reproduce the problem

Install as normal
Use a prompt such as car driving down the freeway at night with the negatives text, watermark, copyright, blurry
Profit?

What should have happened?

Ideally the watermark wouldn't be there

WebUI and Deforum extension Commit IDs

webui commit id - a9fed7c3
txt2vid commit id - ab1c4e7

What GPU were you using for launching?

RTX 4080

On which platform are you launching the webui backend with the extension?

No response

Settings

Console logs

Restoring base VAE
Applying xformers cross attention optimization.
VAE weights loaded.
ModelScope text2video extension for auto1111 webui
Git commit: ab1c4e74 (Mon Mar 20 22:22:46 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Starting text2video
False
DDIM sampling tensor(1): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36/36 [00:24<00:00,  1.48it/s]
STARTING VAE ON GPU. 1 CHUNKS TO PROCESS
DECODING FRAMES
VAE FINISHED
torch.Size([30, 3, 256, 256])
output/mp4s/20230320_212917337065.mp4
text2video finished, saving frames to /home/watzon/Pictures/generated/stable-diffusion/img2img-images/text2video-modelscope/20230320212832
Got a request to stitch frames to video using FFmpeg.
Frames:
/home/watzon/Pictures/generated/stable-diffusion/img2img-images/text2video-modelscope/20230320212832/%06d.png
To Video:
/home/watzon/Pictures/generated/stable-diffusion/img2img-images/text2video-modelscope/20230320212832/vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 0.12 seconds!
t2v complete, result saved at /home/watzon/Pictures/generated/stable-diffusion/img2img-images/text2video-modelscope/20230320212832

[Feature Request]: Memory switcher CUDA / CPU / MPS

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Enable ability to change map for storage -- to fix issue with Mac / M1 machines.

Proposed workflow

(Something)

Additional information

See discussion thread for more info

[Feature Request]: Set the seed manually before generating a video

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

As described in the title, simply setting the seed before generating a video to be able to reproduce it and generate multiple versions of the same video by slightly changing the prompt while keeping the seed the same. It greatly helps to 'stitch' videos together into a longer one.

Proposed workflow

Go to ui to the 'seed' input
Set the seed -1 for a random one or specify it manually. The same as with all other SD generations.

Additional information

No response

[Bug]: generate button unresponsive in webUI then ^C OOM after a while

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

Set to 128x128 12 frames, cfg 7, steps 20.
Generate button doesnt seem to work but collab shows something happening

Git commit: d16d9d47 (Thu Mar 30 09:18:56 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis'.....

Not particularly high RAM

Steps to reproduce the problem

Go to .... modelscope tab
Press .... generate
... wait for error message

What should have happened?

produced short video

WebUI and Deforum extension Commit IDs

webui commit id - a9eab236d7e8afa4d6205127904a385b2c43bb24
txt2vid commit id - d16d9d4

What GPU were you using for launching?

Tesla T4, 15360 MiB, 15101 MiB

On which platform are you launching the webui backend with the extension?

Google Colab (Other)

Settings

...............

Console logs

.........

Additional information

No response

Images won't be converted to a video

The image generation seems to be fine, but the video won't be generated. Any ideas?

Traceback (most recent call last):
  File "F:\UI\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "F:\UI\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "F:\UI\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 52, in process
    lowvram.setup_for_low_vram(sd_model, cmd_opts.medvram)
  File "F:\UI\modules\lowvram.py", line 42, in setup_for_low_vram
    first_stage_model = sd_model.first_stage_model
AttributeError: 'NoneType' object has no attribute 'first_stage_model'

Traceback (most recent call last):
  File "F:\UI\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "F:\UI\venv\lib\site-packages\gradio\blocks.py", line 1018, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "F:\UI\venv\lib\site-packages\gradio\blocks.py", line 956, in postprocess_data
    prediction_value = block.postprocess(prediction_value)
  File "F:\UI\venv\lib\site-packages\gradio\components.py", line 1860, in postprocess
    returned_format = y.split(".")[-1].lower()
AttributeError: 'tuple' object has no attribute 'split'

[Bug]: invalid load key, '<'

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

When attempting to start script using all defaults and simple prompt the error is produced.

Stack trace:
Traceback (most recent call last):
File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 73, in process
pipe = setup_pipeline()
File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 30, in setup_pipeline
return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v')
File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 105, in init
torch.load(
File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/modules/safe.py", line 106, in load
return load_with_extra(filename, extra_handler=global_extra_handler, *args, **kwargs)
File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/modules/safe.py", line 151, in load_with_extra
return unsafe_torch_load(filename, *args, **kwargs)
File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 1002, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

Steps to reproduce the problem

Go to Modelscope tab
Enter prompt
Press Generate

What should have happened?

Video should be produced.

WebUI and Deforum extension Commit IDs

webui commit id - a9fed7c364061ae6efb37f797b6b522cb3cf7aa2
txt2vid commit id - b1b8b78 (HEAD -> main, origin/main, origin/HEAD)

ModelScope text2video extension for auto1111 — version 1.0b.
python: 3.10.7 • torch: 1.13.1+cu117 • xformers: 0.0.17.dev476 • gradio: 3.16.2 • commit: a9fed7c3 • checkpoint:

What GPU were you using for launching?

EC2 G5.xlarge

On which platform are you launching the webui backend with the extension?

Cloud server (Linux)

Settings

All settings are at default

Console logs

Launching launch.py...
################################################################
Python 3.10.7 (main, Mar 16 2023, 07:26:52) [GCC 9.4.0]
Commit hash: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2
Installing requirements for Web UI






Initializing Dreambooth
If submitting an issue on github, please provide the below text for debugging purposes:

Python revision: 3.10.7 (main, Mar 16 2023, 07:26:52) [GCC 9.4.0]
Dreambooth revision: da2e40415f1cb63cc4de46d6dc97eb8676c6e30c
SD-WebUI revision: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2

Successfully installed accelerate-0.17.1
Successfully installed requests-2.28.2
Successfully installed fastapi-0.90.1 starlette-0.23.1
Successfully installed gitpython-3.1.31
Successfully installed transformers-4.27.2

[+] torch version 1.13.1+cu117 installed.
[+] torchvision version 0.14.1+cu117 installed.
[+] xformers version 0.0.17.dev476 installed.
[+] accelerate version 0.17.1 installed.
[+] diffusers version 0.14.0 installed.
[+] transformers version 4.27.2 installed.
[+] bitsandbytes version 0.35.4 installed.


Model loaded

Installing scikit-learn


Launching Web UI with arguments: --api --listen --no-progressbar-hiding --enable-insecure-extension-access --opt-sub-quad-attention --no-hashing --xformers --opt-channelslast --disable-safe-unpickle --no-hashing --xformers-flash-attention
[AddNet] Updating model hashes...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1540.32it/s][AddNet] Updating model hashes...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1850.16it/s]2023-03-23 19:16:41.043581: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-23 19:16:41.772670: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/venv/lib/python3.10/site-packages/cv2/../../lib64:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/usr/local/cuda/efa/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/usr/local/cuda/efa/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:
2023-03-23 19:16:41.772779: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/venv/lib/python3.10/site-packages/cv2/../../lib64:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/usr/local/cuda/efa/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/usr/local/cuda/efa/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:
2023-03-23 19:16:41.772798: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
  File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 73, in process
    pipe = setup_pipeline()
  File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/modelscope-text2vid.py", line 30, in setup_pipeline
    return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v')
  File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts/t2v_pipeline.py", line 105, in __init__
    torch.load(
  File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/modules/safe.py", line 106, in load
    return load_with_extra(filename, extra_handler=global_extra_handler, *args, **kwargs)
  File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/modules/safe.py", line 151, in load_with_extra
    return unsafe_torch_load(filename, *args, **kwargs)
  File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 795, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/ubuntu/stable-diffusion/apps/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 1002, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

Additional information

No response

[Feature Request]: Add upscaling, frame interpolation from Deforum + Face restore

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Since the resulting videos are far smaller than the usual StableDiffusion generated ones (and sometimes flickery as well), they will benefit greatly from adding upscaling and frame interpolation, just like it's done in vanilla Deforum

https://github.com/deforum-art/deforum-for-automatic1111-webui/blob/automatic1111-webui/scripts/deforum_helpers/upscaling.py

https://github.com/deforum-art/deforum-for-automatic1111-webui/blob/automatic1111-webui/scripts/deforum_helpers/frame_interpolation.py

Proposed workflow

Go to text2video tab
Generate animation
Have it upscaled/interpolated automatically or manually in the output tab

Additional information

No response

[Bug]: When I upload the video in Vid2VId tab it works as txt2vid not vid2vid.

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

When I upload the video in Vid2VId tab it works as txt2vid not vid2vid. I also tried 'Input video path' but it doesn't work.

Steps to reproduce the problem

Go to the Vid2vid tab, upload a single video file and press the Generate button.

What should have happened?

Vid2vid, not txt2vid, should work.

WebUI and Deforum extension Commit IDs

webui commit id - a9fed7c364061ae6efb37f797b6b522cb3cf7aa2
txt2vid commit id - 44a8286

What GPU were you using for launching?

3090ti

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Console logs

Git commit: 44a82864 (Fri Mar 24 19:57:10 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0031, device='cuda:0') tensor(1.0040, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:22<00:00,  1.35it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230325_072326329309.mp4
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0048, device='cuda:0') tensor(0.9962, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:15<00:00,  2.06it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS██████████████████████████████████████████████| 31/31 [00:15<00:00,  2.03it/s]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230325_072346565309.mp4
text2video finished, saving frames to F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237
Got a request to stitch frames to video using FFmpeg.
Frames:
F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237\%06d.png
To Video:
F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 0.46 seconds!
t2v complete, result saved at F:\\Download\\SDWEB_OUTPUT\\img2img-images\text2video-modelscope\20230325072237



### Additional information

_No response_

[Bug]: GPU Half Precision options removed ?

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

The option has been removed in the latest update, I can't render videos anymore, I run SD with these commands

set COMMANDLINE_ARGS=--opt-split-attention --xformers --medvram --no-half-vae

Steps to reproduce the problem

Launch a render

What should have happened?

No response

WebUI and Deforum extension Commit IDs

Latest update

What GPU were you using for launching?

Rtx 2060m, 6gb Vram

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

SD Auto's Ui

Console logs

if 'CPU' in cpu_vae:
TypeError: argument of type 'NoneType' is not iterable
Exception occurred: argument of type 'NoneType' is not iterable

Additional information

No response

[Feature Request]: Add the interrupt/skip buttons and progress bars from other webui modules

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Have some sort of visual indicator of the job progress/status in the interface.

Proposed workflow

Standard orange "generate" button is clicked
It then shows two grey buttons "interrupt" and "skip". There is also a progress bar.
When the job is complete, the button returns to its original state.

Additional information

No response

exception occured

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 16/16 [01:34<00:00, 5.91s/it]
DECODING FRAMES
torch.Size([1, 4, 17, 40, 40])
STARTING VAE ON CPU
Exception occured
No operator found for memory_efficient_attention_forward with inputs:
query : shape=(17, 1600, 1, 512) (torch.float32)
key : shape=(17, 1600, 1, 512) (torch.float32)
value : shape=(17, 1600, 1, 512) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0
cutlassF is not supported because:
device=cpu (supported: {'cuda'})
flshattF is not supported because:
device=cpu (supported: {'cuda'})
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
max(query.shape[-1] != value.shape[-1]) > 128
tritonflashattF is not supported because:
device=cpu (supported: {'cuda'})
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
max(query.shape[-1] != value.shape[-1]) > 128
triton is not available
smallkF is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
unsupported embed per head: 512

Steps to reproduce the problem

Go to ....
Press ....
...
.

What should have happened?

WebUI and Deforum extension Commit IDs

webui commit id - a9fed7c364061ae6efb37f797b6b522cb3cf7aa2
txt2vid commit id -1cf09f3109c9379513a1fff93c240a6eb90cbd60

What GPU were you using for launching?

nvidia 980

On which platform are you launching the webui backend with the extension?

No response

Settings

Console logs

Git commit: 1cf09f31 (Tue Mar 21 21:56:03 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Starting text2video
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 16/16 [01:21<00:00,  5.12s/it]
DECODING FRAMES
torch.Size([1, 4, 17, 40, 40])
STARTING VAE ON CPU
Exception occured
No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(17, 1600, 1, 512) (torch.float32)
     key         : shape=(17, 1600, 1, 512) (torch.float32)
     value       : shape=(17, 1600, 1, 512) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`cutlassF` is not supported because:
    device=cpu (supported: {'cuda'})
`flshattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    max(query.shape[-1] != value.shape[-1]) > 128
`tritonflashattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    max(query.shape[-1] != value.shape[-1]) > 128
    triton is not available
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    unsupported embed per head: 512

Additional information

it worked earlier today on those settings and I have a couple of examples
![xupernova](https://user-images.githubusercontent.com/113954966/226770418-2404dd75-7c32-44be

-b5b5-6d8a644ae5a6.gif)

[Bug]: vid2vid throws an exception 'need at least one array to stack'

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

I was trying to use vid2vid but kept getting an exception.

Steps to reproduce the problem

Go to vid2vid and upload a 1 minute video downloaded from YouTube
input the prompt
keep all other settings as default
click generate

What should have happened?

It should have generated a video based on my prompt and the input video.

WebUI and Deforum extension Commit IDs

webui commit id - a9fed7c3
txt2vid commit id - 066a9e1

What GPU were you using for launching?

RTX 4090 24GB

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Windows 11, python: 3.10.10 • torch: 2.0.0+cu118 • xformers: 0.0.17+b6be33a.d20230315 • gradio: 3.16.2

Steps: 30
Frames: 30
cfg_scale: 7
width/height: 256
seed: -1
eta: 0
denoising strength: 0.75
vid2vid start frame: tried both 1 and 200, but same result
batch count: 1
VAE Mode: tried both GPU (half precision) and GPU, but same result

Console logs

`ModelScope text2video extension for auto1111 webui
Git commit: 066a9e13 (Sun Mar 26 15:10:21 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
got a request to *vid2vid* an existing video.
Trying to extract frames from video with input FPS of 23.976023976023978. Please wait patiently.
Successfully extracted 2244.0 frames from video.
Loading frames: 0it [00:00, ?it/s]
Traceback (most recent call last):
  File "C:\source\stable-diffusion-webui_clean\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 125, in process
    images=np.stack(images)# f h w c
  File "<__array_function__ internals>", line 180, in stack
  File "C:\source\stable-diffusion-webui_clean\venv\lib\site-packages\numpy\core\shape_base.py", line 422, in stack
    raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack
Exception occurred: need at least one array to stack`

Additional information

https://www.youtube.com/watch?v=75rRs6fraUI&t=1s&ab_channel=VICENews was the video.

It appears it worked for a different video I input, but not this one. Maybe it's just allergic to BS?

[Feature Request]: Need an option to store generated videos somewhere

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

This option would let the user define a location to automatically store generated video.

Proposed workflow

Click on Advanced Options
In the section "Save To", enter a location on your storage device.

Additional information

No response

Add prompt weighting

At the moment the script just passes the prompt and the negative prompt into the diffusion model. To harness all the power of attention control, it's needed to use the same mechanism as Auto does with the regular CLIP embedder in the hijack section

AttributeError: 'NoneType' object has no attribute 'first_stage_model'

I get this on pressing generate, and I'm not certain if it's because i'm currently running torch2.

Arguments: ('test1', '', 20, 24, 7, 256, 256, 0.0, False) {}
Traceback (most recent call last):
File "C:\stable-diffusion\a1-sd-webui\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "C:\stable-diffusion\a1-sd-webui\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "C:\stable-diffusion\a1-sd-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 52, in process
lowvram.setup_for_low_vram(sd_model, cmd_opts.medvram)
File "C:\stable-diffusion\a1-sd-webui\modules\lowvram.py", line 42, in setup_for_low_vram
first_stage_model = sd_model.first_stage_model
AttributeError: 'NoneType' object has no attribute 'first_stage_model'

Traceback (most recent call last):
File "C:\stable-diffusion\a1-sd-webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
output = await app.get_blocks().process_api(
File "C:\stable-diffusion\a1-sd-webui\venv\lib\site-packages\gradio\blocks.py", line 1018, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "C:\stable-diffusion\a1-sd-webui\venv\lib\site-packages\gradio\blocks.py", line 956, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "C:\stable-diffusion\a1-sd-webui\venv\lib\site-packages\gradio\components.py", line 1860, in postprocess
returned_format = y.split(".")[-1].lower()
AttributeError: 'tuple' object has no attribute 'split'

Exception occured 'NoneType' object has no attribute 'cond_stage_model'

If I start the webui and then generate the video, it works fine. However, when I click the generate button for the second time, an error occurs.

Exception occured
'NoneType' object has no attribute 'cond_stage_model'

[Bug]: Exception occurred: 'unet_dim'

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

I get this error when I click Generate:

Git commit: 066a9e1 (Sun Mar 26 15:10:21 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Traceback (most recent call last):
File "C:\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 74, in process
pipe = setup_pipeline()
File "C:\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 30, in setup_pipeline
return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v')
File "C:\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\t2v_pipeline.py", line 71, in init
in_dim=cfg['unet_in_dim'],
KeyError: 'unet_in_dim'
Exception occurred: 'unet_in_dim'

It seems there is something missing in the configuratuon.json, but its identical to the version on hugginface.

Steps to reproduce the problem

Go to ....
Press ....
...

What should have happened?

No response

WebUI and Deforum extension Commit IDs

webui commit id -
txt2vid commit id -

What GPU were you using for launching?

3060

On which platform are you launching the webui backend with the extension?

No response

Settings

default

Console logs

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Loading A111 WebUI Launcher
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 i   Settings file found, loading
 →   Updating Settings File  ✓
 i   Launcher Version 1.7.0
 i   Found a custom WebUI Config
 i   No Launcher launch options
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 →   Checking requirements :
 i   Python 3.10.6150.1013 found in registry:  C:\Users\User\AppData\Local\Programs\Python\Python310\
 i   Clearing PATH of any mention of Python
 →   Adding python 3.10 to path  ✓
 i   Git found and already in PATH:  C:\Program Files\Git\cmd\git.exe
 i   Automatic1111 SD WebUI found:  C:\A1111 Web UI Autoinstaller\stable-diffusion-webui
 i   One or more checkpoint models were found
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Loading Complete, opening launcher
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 i   No arguments set
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Webui
remote: Enumerating objects: 314, done.
remote: Counting objects: 100% (246/246), done.
remote: Compressing objects: 100% (55/55), done.
remote: Total 314 (delta 212), reused 211 (delta 191), pack-reused 68Receiving objects:  90% (283/314)
Receiving objects: 100% (314/314), 127.80 KiB | 25.56 MiB/s, done.
Resolving deltas: 100% (215/215), completed with 68 local objects.
From https://github.com/AUTOMATIC1111/stable-diffusion-webui
   c1294d84..955df775  master       -> origin/master
 * [new branch]        lora_inplace -> origin/lora_inplace
 ✓   Done
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-modelscope-text2video
remote: Enumerating objects: 227, done.
remote: Counting objects: 100% (215/215), done.
remote: Compressing objects: 100% (75/75), done.
remote: Total 227 (delta 139), reused 206 (delta 138), pack-reused 12
Receiving objects: 100% (227/227), 101.65 KiB | 8.47 MiB/s, done.
Resolving deltas: 100% (143/143), completed with 9 local objects.
From https://github.com/deforum-art/sd-webui-modelscope-text2video
   fddb4e8..066a9e1  main       -> origin/main
   02a35bf..28e5b56  extras     -> origin/extras
 ✓   Done
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-modelscope-text2video
 ✓   Done
 i   No arguments set
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  WEBUI LAUNCHING VIA EMS LAUNCHER, EXIT THIS WINDOW TO STOP THE WEBUI
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 !   Any error happening after 'commit hash : XXXX' is not related to the launcher. Please report them on Automatic1111's github instead :
 ☁   https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/new/choose
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cancel
venv "C:\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 955df7751eef11bb7697e2d77f6b8a6226b21e13
Installing requirements for Web UI

Launching Web UI with arguments: --autolaunch
No module 'xformers'. Proceeding without it.
Loading weights [27a4ac756c] from C:\A1111 Web UI Autoinstaller\stable-diffusion-webui\models\Stable-diffusion\SD15NewVAEpruned.ckpt
Creating model from config: C:\A1111 Web UI Autoinstaller\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 3.9s (load weights from disk: 1.1s, create model: 0.3s, apply weights to model: 0.4s, apply half(): 0.5s, move model to device: 0.5s, load textual inversion embeddings: 1.2s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 12.3s (import torch: 2.5s, import gradio: 1.9s, import ldm: 0.7s, other imports: 1.9s, load scripts: 0.8s, load SD checkpoint: 3.9s, create ui: 0.3s, gradio launch: 0.3s).
ModelScope text2video extension for auto1111 webui
Git commit: 066a9e13 (Sun Mar 26 15:10:21 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Traceback (most recent call last):
  File "C:\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 74, in process
    pipe = setup_pipeline()
  File "C:\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 30, in setup_pipeline
    return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v')
  File "C:\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\t2v_pipeline.py", line 71, in __init__
    in_dim=cfg['unet_in_dim'],
KeyError: 'unet_in_dim'
Exception occurred: 'unet_in_dim'

Additional information

No response

[Bug]: Txt2vid stuck, loading pipeline forever

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

Hello! i tried installing the extention, i copied the models (text2video_pytorch_model.pth, open_clip_pytorch_model.bin, VQGAN_autoencoder.pth) and the config json in a folder i created (ui (main folder of my ui installation, this on google colab) Models>ModelScope>t2v

But wheni try to run it, even with default setting i get stuck on

Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})

For 10-15 minutes with nothin' happening

Steps to reproduce the problem

Just installing and trying to run anything in the extention

What should have happened?

No response

WebUI and Deforum extension Commit IDs

webui commit id -
txt2vid commit id -

What GPU were you using for launching?

Standard Default GPU of the free google colab tier

On which platform are you launching the webui backend with the extension?

No response

Settings

Standard Setting

Console logs

Loading Unprompted v7.9.1 by Therefore Games
(SETUP) Initializing Unprompted object...
(SETUP) Loading configuration files...
(SETUP) Debug mode is False
Loading weights [92970aa785] from /content/gdrive/MyDrive/sd-backup/stable-diffusion-webui/models/Stable-diffusion/dreamlike-photoreal-2.0.safetensors
Creating model from config: /content/gdrive/MyDrive/sd/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(0): 
Model loaded in 22.0s (load weights from disk: 16.5s, create model: 1.0s, apply weights to model: 2.9s, apply half(): 0.9s, move model to device: 0.7s).
Panorama Viewer: enable file-drag-and-drop into txt2img gallery...
Panorama_Viewer: adding sendto button in parent_elem_id: image_buttons_txt2img
Panorama_Viewer: adding sendto button in parent_elem_id: image_buttons_img2img
Panorama_Viewer: adding sendto button in parent_elem_id: image_buttons_extras
Running on public URL: https://5dade70d-2bd9-45bc.gradio.live/
✔ Connected
Startup time: 46.8s (import gradio: 3.8s, import ldm: 8.5s, other imports: 4.8s, list extensions: 1.6s, load scripts: 2.4s, load SD checkpoint: 22.0s, create ui: 1.1s, gradio launch: 2.4s, scripts app_started_callback: 0.1s).
ModelScope text2video extension for auto1111 webui
Git commit: 9f9bd657 (Fri Mar 24 22:49:32 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})

Additional information

No response

[Bug]:

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331045111\vid.mp4
Stitching video...
Stitching video...
Traceback (most recent call last):
File "E:\Documents\AI\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\video_audio_utils.py", line 147, in ffmpeg_stitch_video
process = subprocess.Popen(
File "C:\Python310\lib\subprocess.py", line 969, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Python310\lib\subprocess.py", line 1438, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
PermissionError: [WinError 5] Access is denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\Documents\AI\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 174, in process
ffmpeg_stitch_video(ffmpeg_location=ffmpeg_location, fps=fps, outmp4_path=outdir_current + os.path.sep + f"vid.mp4", imgs_path=os.path.join(outdir_current,
File "E:\Documents\AI\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\video_audio_utils.py", line 158, in ffmpeg_stitch_video
raise Exception(
Exception: Error stitching frames to video. Actual runtime error:[WinError 5] Access is denied
Exception occurred: Error stitching frames to video. Actual runtime error:[WinError 5] Access is denied

ERROR: Exception in ASGI application
Traceback (most recent call last):
File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\anyio\streams\memory.py", line 94, in receive
return self.receive_nowait()
File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\anyio\streams\memory.py", line 89, in receive_nowait
raise WouldBlock
anyio.WouldBlock

Steps to reproduce the problem

Simply running the app

What should have happened?

Videos should have been stitched together and exported into folder

WebUI and Deforum extension Commit IDs

webui commit id - a9fed7c3
txt2vid commit id - ModelScope text2video extension for auto1111 — version 1.0b.

What GPU were you using for launching?

RTX 3070

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

I'm not sure what to put here

Console logs

error: Your local changes to the following files would be overwritten by merge:
        requirements_versions.txt
Please commit your changes or stash them before you merge.
Aborting
Updating a9fed7c3..22bcc7be
venv "E:\Documents\AI\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
Commit hash: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2
Installing requirements for Web UI

Installing requirements for scikit_learn

current transparent-background 1.2.3


Installing requirements for Prompt Gallery

Installing sd-dynamic-prompts requirements.txt



Initializing Riffusion



Initializing Dreambooth
If submitting an issue on github, please provide the below text for debugging purposes:

Python revision: 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
Dreambooth revision: da2e40415f1cb63cc4de46d6dc97eb8676c6e30c
SD-WebUI revision: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2


[+] torch version 2.0.0+cu118 installed.
[+] torchvision version 0.15.1+cu118 installed.
[+] xformers version 0.0.17rc482 installed.
[+] accelerate version 0.17.1 installed.
[+] diffusers version 0.14.0 installed.
[+] transformers version 4.27.2 installed.
[+] bitsandbytes version 0.35.4 installed.

loading Smart Crop reqs from E:\Documents\AI\stable-diffusion-webui\extensions\sd_smartprocess\requirements.txt
Checking Smart Crop requirements.

Installing imageio-ffmpeg requirement for depthmap script
Installing pyqt5 requirement for depthmap script


Installing requirements for Unprompted - img2pez
Installing requirements for Unprompted - pix2pix_zero

Installing video2video requirement: sk-video

Launching Web UI with arguments: --xformers --api --no-half-vae
E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
Additional Network extension not installed, Only hijack built-in lora
LoCon Extension hijack built-in lora successfully
Error loading script: simple_depthmap.py
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\modules\scripts.py", line 248, in load_scripts
    script_module = script_loading.load_module(scriptfile.path)
  File "E:\Documents\AI\stable-diffusion-webui\modules\script_loading.py", line 11, in load_module
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "E:\Documents\AI\stable-diffusion-webui\extensions\multi-subject-render\scripts\simple_depthmap.py", line 11, in <module>
    from repositories.midas.midas.dpt_depth import DPTDepthModel
  File "E:\Documents\AI\stable-diffusion-webui\repositories\midas\midas\dpt_depth.py", line 5, in <module>
    from .blocks import (
  File "E:\Documents\AI\stable-diffusion-webui\repositories\midas\midas\blocks.py", line 4, in <module>
    from .backbones.beit import (
  File "E:\Documents\AI\stable-diffusion-webui\repositories\midas\midas\backbones\beit.py", line 9, in <module>
    from timm.models.beit import gen_relative_position_index
ModuleNotFoundError: No module named 'timm.models.beit'

[AddNet] Updating model hashes...
0it [00:00, ?it/s]
[AddNet] Updating model hashes...
0it [00:00, ?it/s]
Hypernetwork-MonkeyPatch-Extension not found
Error loading script: riffusion.py
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\modules\scripts.py", line 248, in load_scripts
    script_module = script_loading.load_module(scriptfile.path)
  File "E:\Documents\AI\stable-diffusion-webui\modules\script_loading.py", line 11, in load_module
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "E:\Documents\AI\stable-diffusion-webui\extensions\sd-webui-riffusion\scripts\riffusion.py", line 11, in <module>    import torchaudio
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torchaudio\__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torchaudio\_extension.py", line 135, in <module>
    _init_extension()
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torchaudio\_extension.py", line 105, in _init_extension
    _load_lib("libtorchaudio")
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torchaudio\_extension.py", line 52, in _load_lib
    torch.ops.load_library(path)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 643, in load_library
    ctypes.CDLL(path)
  File "C:\Python310\lib\ctypes\__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'E:\Documents\AI\stable-diffusion-webui\venv\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

Error loading script: patch_fixer.py
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\modules\scripts.py", line 248, in load_scripts
    script_module = script_loading.load_module(scriptfile.path)
  File "E:\Documents\AI\stable-diffusion-webui\modules\script_loading.py", line 11, in load_module
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "E:\Documents\AI\stable-diffusion-webui\extensions\sd_auto_fix\scripts\patch_fixer.py", line 16, in <module>
    from modules.sd_hijack_inpainting import do_inpainting_hijack, should_hijack_inpainting
ImportError: cannot import name 'should_hijack_inpainting' from 'modules.sd_hijack_inpainting' (E:\Documents\AI\stable-diffusion-webui\modules\sd_hijack_inpainting.py)

[text2prompt] Following databases are available:
    all-mpnet-base-v2 : danbooru_strict
Loading Unprompted v7.6.0 by Therefore Games
(SETUP) Initializing Unprompted object...
(SETUP) Loading configuration files...
(SETUP) Debug mode is False
Loading weights [76b00ee812] from E:\Documents\AI\stable-diffusion-webui\models\Stable-diffusion\icomix_V02Pruned.safetensors
Creating model from config: E:\Documents\AI\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying xformers cross attention optimization.
Error loading embedding aivazovsky.pt:
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 206, in load_from_dir
    self.load_from_file(fullfn, fn)
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 164, in load_from_file
    if 'string_to_param' in data:
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 999, in __contains__
    raise RuntimeError(
RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.

Error loading embedding cloudcore.pt:
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 206, in load_from_dir
    self.load_from_file(fullfn, fn)
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 164, in load_from_file
    if 'string_to_param' in data:
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 999, in __contains__
    raise RuntimeError(
RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.

Error loading embedding fantasy.pt:
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 206, in load_from_dir
    self.load_from_file(fullfn, fn)
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 164, in load_from_file
    if 'string_to_param' in data:
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 999, in __contains__
    raise RuntimeError(
RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.

Error loading embedding flower_plant.pt:
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 206, in load_from_dir
    self.load_from_file(fullfn, fn)
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 164, in load_from_file
    if 'string_to_param' in data:
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 999, in __contains__
    raise RuntimeError(
RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.

Error loading embedding gloomcore.pt:
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 206, in load_from_dir
    self.load_from_file(fullfn, fn)
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 164, in load_from_file
    if 'string_to_param' in data:
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 999, in __contains__
    raise RuntimeError(
RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.

Error loading embedding glowwave.pt:
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 206, in load_from_dir
    self.load_from_file(fullfn, fn)
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 164, in load_from_file
    if 'string_to_param' in data:
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 999, in __contains__
    raise RuntimeError(
RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.

Error loading embedding laion_7plus.pt:
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 206, in load_from_dir
    self.load_from_file(fullfn, fn)
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 164, in load_from_file
    if 'string_to_param' in data:
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 999, in __contains__
    raise RuntimeError(
RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.

Error loading embedding sac_8plus.pt:
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 206, in load_from_dir
    self.load_from_file(fullfn, fn)
  File "E:\Documents\AI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 164, in load_from_file
    if 'string_to_param' in data:
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 999, in __contains__
    raise RuntimeError(
RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.

Textual inversion embeddings loaded(58): 7dirtywords, advntr, angry512, arcan3, arcan3v2, art by Smoose2, bad-artist-anime, bad-artist, bad-hands-5, bad_prompt, bad_prompt_version2, cardstyle15, charturnerv2, clrs, corneo_spitroast, darkskin_style, defiance512, easynegative, eonn, flame_surge_style, fs2023, ghst-3000, gigaschizonegs, grin512, gustavedore, happy512, hoppagames, laugh512, magicalinterior, nervous512, ng_deepnegative_v1_75t, PlanIt, rfktr_bwmnga, RFKTR_plastic, rosalinenobodysd15, sad512, shock512, smile512, Style-Autumn, style-empire-neg, style-empire, style-hamunaptra, Style-Moana-neg, Style-Moana, Style-NebMagic, Style-Necromancy, Style-Petal-neg, Style-Petal, Style-Psycho-neg, Style-Renaissance-neg, Style-Renaissance, style-rustmagic, Style-Winter-neg, Style-Winter, tarot512, wholesomegrandpas, wholesomegrannies, _stardeaf-greenmageddon_
Textual inversion embeddings skipped(6): 21charturnerv2, InkPunk768, inksketchcolour1subtle, SDA768, UlukInkSketch2, Zootopiav4
Model loaded in 8.4s (create model: 0.4s, apply weights to model: 1.1s, apply half(): 0.6s, move model to device: 0.9s, load textual inversion embeddings: 5.3s).
INFO:     Started server process [13156]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
ERROR:    [Errno 10048] error while attempting to bind on address ('127.0.0.1', 5173): only one usage of each socket address (protocol/network address/port) is normally permitted
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
no config file: E:\Documents\AI\stable-diffusion-webui\extensions\Stable-Diffusion-Webui-Prompt-Translator\prompt_translator.cfg
CUDA SETUP: Loading binary E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll...
[text2prompt] Loading database with name "all-mpnet-base-v2 : danbooru_strict"...
[text2prompt] Database loaded
Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\anyio\streams\memory.py", line 94, in receive
    return self.receive_nowait()
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\anyio\streams\memory.py", line 89, in receive_nowait
    raise WouldBlock
anyio.WouldBlock

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\base.py", line 77, in call_next
    message = await recv_stream.receive()
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\anyio\streams\memory.py", line 114, in receive
    raise EndOfStream
anyio.EndOfStream

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 407, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\fastapi\applications.py", line 271, in __call__
    await super().__call__(scope, receive, send)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\applications.py", line 125, in __call__
    await self.middleware_stack(scope, receive, send)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\errors.py", line 184, in __call__
    raise exc
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\base.py", line 104, in __call__
    response = await self.dispatch_func(request, call_next)
  File "E:\Documents\AI\stable-diffusion-webui\modules\api\api.py", line 96, in log_and_time
    res: Response = await call_next(req)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\base.py", line 80, in call_next
    raise app_exc
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\base.py", line 69, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\gzip.py", line 24, in __call__
    await responder(scope, receive, send)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\gzip.py", line 44, in __call__
    await self.app(scope, receive, self.send_with_gzip)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\exceptions.py", line 79, in __call__
    raise exc
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\middleware\exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in __call__
    raise e
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\routing.py", line 706, in __call__
    await route.handle(scope, receive, send)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\routing.py", line 69, in app
    await response(scope, receive, send)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\starlette\responses.py", line 334, in __call__
    raise RuntimeError(f"File at path {self.path} does not exist.")
RuntimeError: File at path E:\Documents\AI\stable-diffusion-webui\static\background.png does not exist.
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1018, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 935, in postprocess_data
    if predictions[i] is components._Keywords.FINISHED_ITERATING:
IndexError: tuple index out of range
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1018, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "E:\Documents\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 935, in postprocess_data
    if predictions[i] is components._Keywords.FINISHED_ITERATING:
IndexError: tuple index out of range
ModelScope text2video extension for auto1111 webui
Git commit: 9f9bd657 (Fri Mar 24 22:49:32 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0032, device='cuda:0') tensor(1.0001, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:18<00:00,  1.64it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230331_044231808812.mp4
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0048, device='cuda:0') tensor(1.0027, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:17<00:00,  1.77it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS██████████████████████████████████████████████| 31/31 [00:17<00:00,  1.77it/s]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230331_044258693495.mp4
text2video finished, saving frames to E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331044134
Got a request to stitch frames to video using FFmpeg.
Frames:
E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331044134\%06d.png
To Video:
E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331044134\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 0.60 seconds!
t2v complete, result saved at E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331044134
ModelScope text2video extension for auto1111 webui
Git commit: 9f9bd657 (Fri Mar 24 22:49:32 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0035, device='cuda:0') tensor(0.9993, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:17<00:00,  1.79it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230331_044453768088.mp4
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0033, device='cuda:0') tensor(0.9998, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:17<00:00,  1.79it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS██████████████████████████████████████████████| 31/31 [00:17<00:00,  1.80it/s]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230331_044520326138.mp4
text2video finished, saving frames to E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331044402
Got a request to stitch frames to video using FFmpeg.
Frames:
E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331044402\%06d.png
To Video:
E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331044402\vid.mp4
Stitching *video*...
Stitching *video*...
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\video_audio_utils.py", line 147, in ffmpeg_stitch_video
    process = subprocess.Popen(
  File "C:\Python310\lib\subprocess.py", line 969, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python310\lib\subprocess.py", line 1438, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
PermissionError: [WinError 5] Access is denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 174, in process
    ffmpeg_stitch_video(ffmpeg_location=ffmpeg_location, fps=fps, outmp4_path=outdir_current + os.path.sep + f"vid.mp4", imgs_path=os.path.join(outdir_current,
  File "E:\Documents\AI\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\video_audio_utils.py", line 158, in ffmpeg_stitch_video
    raise Exception(
Exception: Error stitching frames to video. Actual runtime error:[WinError 5] Access is denied
Exception occurred: Error stitching frames to video. Actual runtime error:[WinError 5] Access is denied
ModelScope text2video extension for auto1111 webui
Git commit: 9f9bd657 (Fri Mar 24 22:49:32 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0019, device='cuda:0') tensor(0.9985, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:17<00:00,  1.78it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230331_044710418252.mp4
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0051, device='cuda:0') tensor(1.0006, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:17<00:00,  1.77it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS██████████████████████████████████████████████| 31/31 [00:17<00:00,  1.79it/s]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230331_044737439069.mp4
text2video finished, saving frames to E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331044618
Got a request to stitch frames to video using FFmpeg.
Frames:
E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331044618\%06d.png
To Video:
E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331044618\vid.mp4
Stitching *video*...
Stitching *video*...
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\video_audio_utils.py", line 147, in ffmpeg_stitch_video
    process = subprocess.Popen(
  File "C:\Python310\lib\subprocess.py", line 969, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python310\lib\subprocess.py", line 1438, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
PermissionError: [WinError 5] Access is denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 174, in process
    ffmpeg_stitch_video(ffmpeg_location=ffmpeg_location, fps=fps, outmp4_path=outdir_current + os.path.sep + f"vid.mp4", imgs_path=os.path.join(outdir_current,
  File "E:\Documents\AI\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\video_audio_utils.py", line 158, in ffmpeg_stitch_video
    raise Exception(
Exception: Error stitching frames to video. Actual runtime error:[WinError 5] Access is denied
Exception occurred: Error stitching frames to video. Actual runtime error:[WinError 5] Access is denied
ModelScope text2video extension for auto1111 webui
Git commit: 9f9bd657 (Fri Mar 24 22:49:32 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0024, device='cuda:0') tensor(0.9942, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:17<00:00,  1.80it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230331_045203363720.mp4
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0020, device='cuda:0') tensor(1.0070, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:17<00:00,  1.78it/s]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS██████████████████████████████████████████████| 31/31 [00:17<00:00,  1.79it/s]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230331_045229893721.mp4
text2video finished, saving frames to E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331045111
Got a request to stitch frames to video using FFmpeg.
Frames:
E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331045111\%06d.png
To Video:
E:\Documents\AI\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230331045111\vid.mp4
Stitching *video*...
Stitching *video*...
Traceback (most recent call last):
  File "E:\Documents\AI\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\video_audio_utils.py", line 147, in ffmpeg_stitch_video
    process = subprocess.Popen(
  File "C:\Python310\lib\subprocess.py", line 969, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python310\lib\subprocess.py", line 1438, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
PermissionError: [WinError 5] Access is denied

Additional information

No response

Expecting value: line 1 column 1 (char 0)

I get this error when I click generate:

Pipeline setup
Exception occured
Expecting value: line 1 column 1 (char 0)

Did anyone Get it Running

I have an RTX 3080 16GB, But I am still unable to run it

I am getting this error

Traceback (most recent call last):
  File "C:\Users\pavan\Desktop\Projects\Deep-Learning\webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\pavan\Desktop\Projects\Deep-Learning\webui\venv\lib\site-packages\gradio\blocks.py", line 1018, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "C:\Users\pavan\Desktop\Projects\Deep-Learning\webui\venv\lib\site-packages\gradio\blocks.py", line 956, in postprocess_data
    prediction_value = block.postprocess(prediction_value)
  File "C:\Users\pavan\Desktop\Projects\Deep-Learning\webui\venv\lib\site-packages\gradio\components.py", line 1860, in postprocess
    returned_format = y.split(".")[-1].lower()
AttributeError: 'tuple' object has no attribute 'split'

[Bug]: Exception occured [Errno 2] No such file or directory

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

I have placed the files in the correct folder but it is not able to read them and i noticed that there is a bit of an odd formatting

Git commit: 092aa42 (Wed Mar 22 19:28:16 2023)
Starting text2video
Pipeline setup
Exception occured
[Errno 2] No such file or directory: 'E:\Automatic1111\stable-diffusion-webui\models/ModelScope/t2v/configuration.json'

the files are in E:\Automatic1111\stable-diffusion-webui\models\Stable-diffusion\ModelScope\t2v not sure what is going on

Steps to reproduce the problem

download the files
place them in the folder

What should have happened?

No response

WebUI and Deforum extension Commit IDs

webui commit id -
txt2vid commit id -

What GPU were you using for launching?

3090ti

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Console logs

Git commit: 092aa423 (Wed Mar 22 19:28:16 2023)
Starting text2video
Pipeline setup
Exception occured
[Errno 2] No such file or directory: 'E:\\Automatic1111\\stable-diffusion-webui\\models/ModelScope/t2v/configuration.json'

Additional information

No response

Fix VAE off-loading

Offloaded VAE in LowVRAM mode sometimes outputs invalid tensors

[Bug]:

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

All txt2img generations resulting in:

Exception occurred: argument of type 'NoneType' is not iterable

I have triple checked the installation instructions and re-downloaded the models, in an attempt to prevent simple oversight.

Steps to reproduce the problem

Click Generate using following settings (or any other combination of settings that I have tried)

Pipeline launches successfully.
DDIM sampling tensor(1) completes sucessfully.
Following Fault Occurs

ModelScope text2video extension for auto1111 webui
Git commit: c8335de (Mon Mar 27 19:49:45 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
0%| | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0008, device='cuda:0') tensor(1.0006, device='cuda:0')
DDIM sampling tensor(1): 100%|██████████████████████████████████████████████████████████████████████████████████| 31/31 [00:43<00:00, 1.40s/it]
Traceback (most recent call last):██████████████████████████████████████████████████████████████████████████████| 31/31 [00:43<00:00, 1.30s/it]
File "C:\Users\XXX\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 160, in process
samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale,
File "C:\Users\XXX\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\t2v_pipeline.py", line 255, in infer
if 'CPU' in cpu_vae:
TypeError: argument of type 'NoneType' is not iterable
Exception occurred: argument of type 'NoneType' is not iterable

What should have happened?

No response

WebUI and Deforum extension Commit IDs

webui commit id - 955df77
txt2vid commit id -c8335de9

What GPU were you using for launching?

3060 -12 gb

On which platform are you launching the webui backend with the extension?

No response

Settings

Console logs

ModelScope text2video extension for auto1111 webui
Git commit: c8335de9 (Mon Mar 27 19:49:45 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
  0%|                                                                                                                     | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0008, device='cuda:0') tensor(1.0006, device='cuda:0')
DDIM sampling tensor(1): 100%|██████████████████████████████████████████████████████████████████████████████████| 31/31 [00:43<00:00,  1.40s/it]
Traceback (most recent call last):██████████████████████████████████████████████████████████████████████████████| 31/31 [00:43<00:00,  1.30s/it]
  File "C:\Users\xxx\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 160, in process
    samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale,
  File "C:\Users\xxx\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\t2v_pipeline.py", line 255, in infer
    if 'CPU' in cpu_vae:
TypeError: argument of type 'NoneType' is not iterable
Exception occurred: argument of type 'NoneType' is not iterable

Additional information

Windows 10 running locally.

(script) code for model unloading

This is a way to unload the currently loaded model with a button, please feel free to copy whatever you need for the extension!

This I learned existed in supermerger extension. It's useful for getting enough vram for upscaling tasks involving gpu and when you are using interrogators.

import gradio as gr
import gc
from modules import sd_models,scripts,shared,sd_hijack,devices

class Script(scripts.Script):
    def title(self):
        return "Unload Button"

    def show(self, is_img2img):
        return True

    def ui(self, is_img2img):
        unloadmodel = gr.Button(value="unload model",variant='primary')

        def unload():
            sd_hijack.model_hijack.undo_hijack(shared.sd_model)
            shared.sd_model = None
            gc.collect()
            devices.torch_gc()
            return

        unloadmodel.click(fn=unload)

[Bug]: AssertionError: function with index 495 not defined.

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

I just updated the extension and now I'm getting this error. Worked fine 5 hours ago.

Steps to reproduce the problem

Open webui-user.bat
Under ModelScope tab input any prompt with stock settings
Press Generate

What should have happened?

It should generate the video not return an error

WebUI and Deforum extension Commit IDs

webui commit id - commit: [a9fed7c3]

txt2vid commit id - 092aa42

What GPU were you using for launching?

RTX 3080 10GB

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Console logs

venv "A:\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2
Installing requirements for Web UI



Launching Web UI with arguments:
No module 'xformers'. Proceeding without it.
Loading weights [f94d96ebdc] from A:\stable-diffusion-webui\models\Stable-diffusion\HassanBlend1.5.1.2-pruned.safetensors
Creating model from config: A:\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights found near the checkpoint: A:\stable-diffusion-webui\models\Stable-diffusion\HassanBlend1.5.1.2-pruned.vae.pt
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(5): donut, donut2, donut3, test2, test3
Textual inversion embeddings skipped(2): donut4, test
Model loaded in 16.6s (load weights from disk: 0.3s, create model: 0.3s, apply weights to model: 13.0s, apply half(): 0.6s, load VAE: 0.9s, move model to device: 0.6s, load textual inversion embeddings: 0.8s).
*Deforum ControlNet support: enabled*
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 26.2s (import gradio: 2.3s, import ldm: 1.3s, other imports: 2.4s, list extensions: 0.8s, setup codeformer: 0.1s, load scripts: 1.8s, load SD checkpoint: 16.7s, create ui: 0.6s).
Traceback (most recent call last):
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1015, in process_api
    result = await self.call_function(
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 795, in call_function
    assert block_fn.fn, f"function with index {fn_index} not defined."
AssertionError: function with index 495 not defined.
Traceback (most recent call last):
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1015, in process_api
    result = await self.call_function(
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 795, in call_function
    assert block_fn.fn, f"function with index {fn_index} not defined."
AssertionError: function with index 495 not defined.
Traceback (most recent call last):
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1015, in process_api
    result = await self.call_function(
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 795, in call_function
    assert block_fn.fn, f"function with index {fn_index} not defined."
AssertionError: function with index 495 not defined.
Traceback (most recent call last):
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1015, in process_api
    result = await self.call_function(
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 795, in call_function
    assert block_fn.fn, f"function with index {fn_index} not defined."
AssertionError: function with index 495 not defined.
Traceback (most recent call last):
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1015, in process_api
    result = await self.call_function(
  File "A:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 795, in call_function
    assert block_fn.fn, f"function with index {fn_index} not defined."
AssertionError: function with index 495 not defined.

Additional information

I tried to generate 3 times with each setting (gpu, gpu half precision and cpu) but they all gave the same error.

[Bug]: It seems it cant create a folder

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

im trying to generate a text to video, it seems interpreter cant create a folder

Steps to reproduce the problem

click a generate button

What should have happened?

No response

WebUI and Deforum extension Commit IDs

webui commit id - 3715ece0
txt2vid commit id - d16d9d4

What GPU were you using for launching?

NVIDIA RTX 2060 Super

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Console logs

venv "C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 3715ece0adce7bf7c5e9c5ab3710b2fdc3848f39
Installing requirements for Web UI
Installing sd-dynamic-prompts requirements.txt

current transparent-background 1.2.3


Installing imageio-ffmpeg requirement for depthmap script
Installing pyqt5 requirement for depthmap script

Installing video2video requirement: sk-video

Launching Web UI with arguments: --xformers --no-half-vae
2023-04-01 17:56:25.0364228 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1641 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.
Loading weights [fdffd3c312] from C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\models\Stable-diffusion\ambientmixAnAnime_v10.safetensors
Creating model from config: C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\models\VAE\orangemix.vae.pt
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(2): badhandv4, easynegative
Model loaded in 98.5s (load weights from disk: 12.1s, load config: 0.1s, create model: 1.2s, apply weights to model: 63.5s, apply half(): 0.8s, load VAE: 1.3s, move model to device: 1.1s, load textual inversion embeddings: 18.4s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

Closing server running on port: 7860
Restarting UI...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
ModelScope text2video extension for auto1111 webui
Git commit: Unknown
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 24, 24]) tensor(0.0002, device='cuda:0') tensor(1.0018, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:50<00:00,  1.63s/it]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS██████████████████████████████████████████████| 31/31 [00:50<00:00,  1.37it/s]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 192, 192])
output/mp4s/20230401_191007514020.mp4
text2video finished, saving frames to C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\outputs/img2img-images\text2video-modelscope\20230401190401
Got a request to stitch frames to video using FFmpeg.
Frames:
C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\outputs/img2img-images\text2video-modelscope\20230401190401\%06d.png
To Video:
C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\outputs/img2img-images\text2video-modelscope\20230401190401\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 4.14 seconds!
t2v complete, result saved at C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\outputs/img2img-images\text2video-modelscope\20230401190401
Traceback (most recent call last):
  File "C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 182, in process
    mp4 = open(outdir_current + os.path.sep + f"vid.mp4", 'rb').read()
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\kolya\\OneDrive\\Рабочий стол\\stablediff\\ai\\outputs/img2img-images\\text2video-modelscope\\20230401190401\\vid.mp4'
Exception occurred: [Errno 2] No such file or directory: 'C:\\Users\\kolya\\OneDrive\\Рабочий стол\\stablediff\\ai\\outputs/img2img-images\\text2video-modelscope\\20230401190401\\vid.mp4'
ModelScope text2video extension for auto1111 webui
Git commit: Unknown
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 24, 24]) tensor(-0.0033, device='cuda:0') tensor(1.0038, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:34<00:00,  1.11s/it]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS██████████████████████████████████████████████| 31/31 [00:34<00:00,  1.38it/s]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 192, 192])
output/mp4s/20230401_191850411983.mp4
text2video finished, saving frames to C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\outputs/img2img-images\text2video-modelscope\20230401191135
Got a request to stitch frames to video using FFmpeg.
Frames:
C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\outputs/img2img-images\text2video-modelscope\20230401191135\%06d.png
To Video:
C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\outputs/img2img-images\text2video-modelscope\20230401191135\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 31.17 seconds!
t2v complete, result saved at C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\outputs/img2img-images\text2video-modelscope\20230401191135
Traceback (most recent call last):
  File "C:\Users\kolya\OneDrive\Рабочий стол\stablediff\ai\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 182, in process
    mp4 = open(outdir_current + os.path.sep + f"vid.mp4", 'rb').read()
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\kolya\\OneDrive\\Рабочий стол\\stablediff\\ai\\outputs/img2img-images\\text2video-modelscope\\20230401191135\\vid.mp4'
Exception occurred: [Errno 2] No such file or directory: 'C:\\Users\\kolya\\OneDrive\\Рабочий стол\\stablediff\\ai\\outputs/img2img-images\\text2video-modelscope\\20230401191135\\vid.mp4'

Additional information

No response

[Bug]: Getting "TextToVideoSynthesis.infer() takes from 6 to 11 positional arguments but 12 were given"

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

Wrote "A girl on a swing" in the prompt, kept everything default. I have the models and config under the specified path in "How to install..." tab. When I clicked generate, something started happening, my console started putting out info but stopped at:

Starting text2video
Exception occured
TextToVideoSynthesis.infer() takes from 6 to 11 positional arguments but 12 were given

And then it does not work and nothing is being generated in the bg.

Steps to reproduce the problem

Go to Modelscope text2video tab (latest A1111 commit and latest MS commit)
Write "A girl on a swing" in prompt
Click Generate
Get the error and move on with your life.

What should have happened?

It should have generated some kind of images/frames/video.

WebUI and Deforum extension Commit IDs

webui commit id - Commit hash: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2
txt2vid commit id - Git commit: a1aa328 (Tue Mar 21 10:54:55 2023)

What GPU were you using for launching?

3090 24gb vram

On which platform are you launching the webui backend with the extension?

No response

Settings

All default.

Console logs

https://pastebin.com/41Ld1b9r

Additional information

No response

[Bug]: cannot import name 'get_quick_vid_info'

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

I am getting this error when I am trying to install the latest version ImportError: cannot import name 'get_quick_vid_info' from 'scripts.video_audio_utils' (C:\Users\pavan\Desktop\Projects\Deep-Learning\webui\extensions\sd-webui-modelscope-text2video\scripts\video_audio_utils.py)

Steps to reproduce the problem

Go to ....
Press ....
...
Install the latest version and try to run it

What should have happened?

No response

WebUI and Deforum extension Commit IDs

webui commit id - b1b8b78
txt2vid commit id -

What GPU were you using for launching?

Rtx 3080 16GB

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

No changes to the default settings

Console logs

Traceback (most recent call last):
  File "C:\Users\pavan\Desktop\Projects\Deep-Learning\webui\modules\scripts.py", line 248, in load_scripts
    script_module = script_loading.load_module(scriptfile.path)
  File "C:\Users\pavan\Desktop\Projects\Deep-Learning\webui\modules\script_loading.py", line 11, in load_module
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\Users\pavan\Desktop\Projects\Deep-Learning\webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 22, in <module>        
    from scripts.video_audio_utils import ffmpeg_stitch_video, find_ffmpeg_binary, get_quick_vid_info, vid2frames, duplicate_pngs_from_folder, clean_folder_name   
ImportError: cannot import name 'get_quick_vid_info' from 'scripts.video_audio_utils' (C:\Users\pavan\Desktop\Projects\Deep-Learning\webui\extensions\sd-webui-modelscope-text2video\scripts\video_audio_utils.py)

Additional information

No response

[Feature Request]: Add batch count

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Add a batch count so you could generate 10-20 videos in an automated fashion like with txt2img

Proposed workflow

See above

Additional information

No response

[Bug]: On generation, It samples the tensors twice and vid2vid doesn't work.

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

Every time I click "generate" it tries to sample the tensors two times instead of only once. That means I have to wait double the time for only one video. I understand vid2vid got added, but even so, when I tried to do vid2vid generation, It outputs only from the text2video tab, even if samples the tensors two times.

Steps to reproduce the problem

For text2video and vid2vid

Go to ModelScope text2video
Add a prompt, for example "sunrise from tokyo, by makoto shinkai"
Click the yellow "Generate" button.
Waiting twice the time.
For vid2vid
Add a video that i got generated from text2video
Add a prompt, for example "a boy with sunglesses"
Click generate and still waiting twice the time because it does the sampling of the tesnsors twice.
Finding out that vid2vid doesn't generate from the vid2vid tab, but from text2video (which i left blank and outputs something else, like a tortoise underwater)

What should have happened?

It should sample the tensors only once if using only text2video
It should sample the tensors twice if I make add the prompts for both text2video and vid2vid.
It should sample the tensors once if only vid2vid was selected.

WebUI and Deforum extension Commit IDs

webui commit id - commit: a9fed7c3
txt2vid commit id -//github.com/deforum-art/sd-webui-modelscope-text2video.git | 8402005 (Fri Mar 24 14:49:52 2023)

What GPU were you using for launching?

RTX 3060 12GB VRAM, 16 GB Ram.

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

--xformers --no-half-vae --api
I didn't change nothing, I just added the prompts, everything is at default with fp16 enabled for the gpu

Console logs

ModelScope text2video extension for auto1111 webui
Git commit: 84020058 (Fri Mar 24 14:49:52 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0012, device='cuda:0') tensor(1.0001, device='cuda:0')
DDIM sampling tensor(1): 100%|███████████████████████████████████████| 31/31 [00:41<00:00,  1.33s/it]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230324_215112403414.mp4
  0%|                                                                          | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(-0.0007, device='cuda:0') tensor(1.0037, device='cuda:0')DDIM sampling tensor(1): 100%|███████████████████████████████████████| 31/31 [00:41<00:00,  1.34s/it]
STARTING VAE ON GPU. 24 CHUNKS TO PROCESS████████████████████████████| 31/31 [00:41<00:00,  1.34s/it]
VAE HALVED
DECODING FRAMES
VAE FINISHED
torch.Size([24, 3, 256, 256])
output/mp4s/20230324_215201616361.mp4
text2video finished, saving frames to C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000
Got a request to stitch frames to video using FFmpeg.
Frames:
C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000\%06d.png
To Video:
C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000\vid.mp4
Stitching *video*...
Stitching *video*...
Video stitching done in 0.26 seconds!
t2v complete, result saved at C:\Stable Diffusion\stable-diffusion-webui\outputs/img2img-images\text2video-modelscope\20230324215000

Additional information

No response

Repos for Training and Finetuning (1 already available!)

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Is there any released training code or published paper mentioning the training methods used for this model?

Proposed workflow

N/A

Additional information

No response

[Feature Request]: Add LoRA

Please add LoRA to ModelScope !

[Bug]:

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

File "C:\Users\howv\Documents\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 160, in process
samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale,
File "C:\Users\howv\Documents\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\t2v_pipeline.py", line 255, in infer
if 'CPU' in cpu_vae:
TypeError: argument of type 'NoneType' is not iterable
Exception occurred: argument of type 'NoneType' is not iterable

Steps to reproduce the problem

Go to .... enter prompt
Press ....generate
... error

What should have happened?

surely generated a video

WebUI and Deforum extension Commit IDs

webui commit id - commit: [3715ece0]
txt2vid commit id -not sure where to find it

What GPU were you using for launching?

nvidia 2060 super 8gb

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Console logs

Loading A111 WebUI Launcher
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 i   Settings file found, loading
 →   Updating Settings File  ✓
 i   Launcher Version 1.7.0
 i   Found a custom WebUI Config
 i   No Launcher launch options
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 →   Checking requirements :
 i   Python 3.10.7150.1013 found in registry:  C:\Users\howv\AppData\Local\Programs\Python\Python310\
 !   This is not the recommended version of Python and will probably cause errors
 i   Clearing PATH of any mention of Python
 →   Adding python 3.10 to path  ✓
 i   Git found and already in PATH:  C:\Program Files\Git\cmd\git.exe
 i   Automatic1111 SD WebUI found:  C:\Users\howv\Documents\A1111 Web UI Autoinstaller\stable-diffusion-webui
 i   One or more checkpoint models were found
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Loading Complete, opening launcher
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 i   Arguments are now: --xformers
 i   Arguments are now: --xformers
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  WEBUI LAUNCHING VIA EMS LAUNCHER, EXIT THIS WINDOW TO STOP THE WEBUI
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 !   Any error happening after 'commit hash : XXXX' is not related to the launcher. Please report them on Automatic1111's github instead :
 ☁   https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/new/choose
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cancel
venv "C:\Users\howv\Documents\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
Commit hash: 3715ece0adce7bf7c5e9c5ab3710b2fdc3848f39
Installing requirements for Web UI
Installing requirement for sd-webui-controlnet


Launching Web UI with arguments: --autolaunch --xformers
Loading weights [27a4ac756c] from C:\Users\howv\Documents\A1111 Web UI Autoinstaller\stable-diffusion-webui\models\Stable-diffusion\SD15NewVAEpruned.ckpt
Creating model from config: C:\Users\howv\Documents\A1111 Web UI Autoinstaller\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(0):
Model loaded in 3.7s (load weights from disk: 1.2s, create model: 0.3s, apply weights to model: 0.3s, apply half(): 0.6s, move model to device: 0.5s, load textual inversion embeddings: 0.7s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
ModelScope text2video extension for auto1111 webui
Git commit: Unknown
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
Working in txt2vid mode
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]latents torch.Size([1, 4, 24, 32, 32]) tensor(0.0010, device='cuda:0') tensor(0.9977, device='cuda:0')
DDIM sampling tensor(1): 100%|█████████████████████████████████████████████████████████| 31/31 [00:41<00:00,  1.35s/it]
Traceback (most recent call last):█████████████████████████████████████████████████████| 31/31 [00:41<00:00,  1.24s/it]
  File "C:\Users\howv\Documents\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 160, in process
    samples, _ = pipe.infer(prompt, n_prompt, steps, frames, seed + batch if seed != -1 else -1, cfg_scale,
  File "C:\Users\howv\Documents\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\t2v_pipeline.py", line 255, in infer
    if 'CPU' in cpu_vae:
TypeError: argument of type 'NoneType' is not iterable
Exception occurred: argument of type 'NoneType' is not iterable

Additional information

No response

[Bug]: Expecting value: line 1 column 1 (char 0)

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

After I push the generate button, nothing happens. I get this message from the log:

ModelScope text2video extension for auto1111 webui
Git commit: d16d9d4 (Thu Mar 30 09:18:56 2023)
Starting text2video
Pipeline setup
Traceback (most recent call last):
File "C:\Stable Diffusion\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 76, in process
pipe = setup_pipeline()
File "C:\Stable Diffusion\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 30, in setup_pipeline
return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v')
File "C:\Stable Diffusion\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\t2v_pipeline.py", line 59, in init
config_dict = json.load(f)
File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\json_init_.py", line 293, in load
return loads(fp.read(),
File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Exception occurred: Expecting value: line 1 column 1 (char 0)

Steps to reproduce the problem

after installaton with huggingface models added
Switch to Modelscope tab,
prompt writing into the text field
generate button pushed
Show video button pushed
.Nothing happens, video shows the error message
Log shows error

What should have happened?

I would like to get a short video, instead off the error video

WebUI and Deforum extension Commit IDs

webui commit id -
txt2vid commit id -

What GPU were you using for launching?

3060 12 GB ram

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Normal settings, no vmed ram, no xformer, (not works with vmed ram and/or xformer )

Console logs

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Loading A111 WebUI Launcher
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 i   Settings file found, loading
 →   Updating Settings File  ✓
 i   Launcher Version 1.7.0
 i   Found a custom WebUI Config
 i   No Launcher launch options
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 →   Checking requirements :
 i   Python 3.10.6150.1013 found in registry:  C:\Users\PC\AppData\Local\Programs\Python\Python310\
 i   Clearing PATH of any mention of Python
 →   Adding python 3.10 to path  ✓
 i   Git found and already in PATH:  C:\Program Files\Git\cmd\git.exe
 i   Automatic1111 SD WebUI found:  C:\Stable Diffusion\stable-diffusion-webui
 i   One or more checkpoint models were found
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Loading Complete, opening launcher
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 i   No arguments set
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Webui
 ✓   Done
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: deforum-for-automatic1111-webui
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-modelscope-text2video
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: stable-diffusion-webui-images-browser
 ✓   Done
 i   No arguments set
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  WEBUI LAUNCHING VIA EMS LAUNCHER, EXIT THIS WINDOW TO STOP THE WEBUI
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 !   Any error happening after 'commit hash : XXXX' is not related to the launcher. Please report them on Automatic1111's github instead :
 ☁   https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/new/choose
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cancel
venv "C:\Stable Diffusion\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 22bcc7be428c94e9408f589966c2040187245d81
Installing requirements for Web UI



Launching Web UI with arguments: --autolaunch
No module 'xformers'. Proceeding without it.
Loading weights [27a4ac756c] from C:\Stable Diffusion\stable-diffusion-webui\models\Stable-diffusion\SD15NewVAEpruned.ckpt
Creating model from config: C:\Stable Diffusion\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 4.8s (load weights from disk: 1.0s, create model: 0.5s, apply weights to model: 0.7s, apply half(): 1.0s, move model to device: 0.7s, load textual inversion embeddings: 0.9s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 12.1s (import torch: 1.7s, import gradio: 1.1s, import ldm: 0.5s, other imports: 0.9s, load scripts: 1.4s, load SD checkpoint: 4.9s, create ui: 1.2s, gradio launch: 0.3s).
ModelScope text2video extension for auto1111 webui
Git commit: d16d9d47 (Thu Mar 30 09:18:56 2023)
Starting text2video
Pipeline setup
Traceback (most recent call last):
  File "C:\Stable Diffusion\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 76, in process
    pipe = setup_pipeline()
  File "C:\Stable Diffusion\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 30, in setup_pipeline
    return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v')
  File "C:\Stable Diffusion\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\t2v_pipeline.py", line 59, in __init__
    config_dict = json.load(f)
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Exception occurred: Expecting value: line 1 column 1 (char 0)

Additional information

No response

[Feature request] Add video2video mode (with in-painting and outpainting analogues for making vid from keyframes and AI-continuing vids)

Just like stable diffusion is transforming one picture into another one (or noise, if the input is not specified), this model is theoretically capable of transforming a video into another video, using text hints if we initialize the latents with the input video frames

https://github.com/deforum-art/sd-webui-modelscope-text2video/blob/857594d61ea776794296ffa6d256bf93eaa7fcd2/scripts/t2v_pipeline.py#L153

The proposed scheme (like img2img, but to videos)

Prepare input videos for the input mode (rescaling, cutting to the input length)
Encode videos to the latent representation by running the VAE
Configure the DDIM scheduler to use Denoising strength
Pass the latents to the pipeline and test it
Configure denoising strength influence
Bonus: In-framing —pass an input video, add a few keyframes, mask them, fill the rest with latent noise or keep the original, vid2vid diffuse not masked area. Just like in-painting, but for vid2vid
Bonus 2: Video continuation — extending the vid with latent noise frames and moving a 'window' making the aforementioned 'in-framing', thus allowing the video to extend beyond vram bounds, but losing some of its temporal coherence

[Bug]: Error while processing rearrange-reduction pattern "b c f h w -> (b f) c h w". Input tensor shape: torch.Size([1, 4, 32, 32]). Additional info: {}. Expected 5 dimensions, got 4

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

Run on a 3060Ti card,8Gig Vram.

"CUDA is out of Memory " error is displayed when selecting the GPU to run.
And if I selected the CPU to run, another error message came out.
Here is the error information below.

Steps to reproduce the problem

① I have a 3060Ti--8G Vram graphics card, but after I install the T2V plug-in as required, I keep the default T2V Settings, such as 24 frames, 256 pixels, the system will still alert CUDA is OUT of memory error, I don't know what is wrong with the Settings?

② Then I select the CPU running mode, and the system can process DDIM, but the following error will be displayed:
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Starting text2video
False

DECODING FRAMES
torch.Size([24, 4, 32, 32])
STARTING VAE ON CPU
Exception occured

Error while processing rearrange-reduction pattern "b c f h w -> (b f) c h w".
Input tensor shape: torch.Size([1, 4, 32, 32]). Additional info: {}.
Expected 5 dimensions, got 4

What should have happened?

No response

WebUI and Deforum extension Commit IDs

webui commit id - [a9fed7c3]
txt2vid commit id -version 1.0b

What GPU were you using for launching?

3060Ti 8G VRAM

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

WIN 10 + Edge Browser

Console logs

You are running torch 1.12.1+cu113.
The program is tested to work with torch 1.13.1.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.

Use --skip-version-check commandline argument to disable this check.
==============================================================================
=================================================================================
You are running xformers 0.0.14.dev.
The program is tested to work with xformers 0.0.16rc425.
To reinstall the desired version, run with commandline flag --reinstall-xformers.

Use --skip-version-check commandline argument to disable this check.
=================================================================================
Loading weights [4e704d22c3] from D:\AI_WebUI_SD\models\Stable-diffusion\SunshineMix＆SunlightMix.safetensors
Creating model from config: D:\AI_WebUI_SD\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(7): easynegative, emb-rrf2, gelatapuella, ghst-3000, opt-6000, PureErosFace_V1, ulzzang-6500-v1.1
Textual inversion embeddings skipped(1): DaveSpaceFour
Model loaded in 1.4s (create model: 0.3s, apply weights to model: 0.6s, apply half(): 0.5s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 19.4s (import gradio: 2.6s, import ldm: 0.7s, other imports: 1.3s, list extensions: 1.4s, load scripts: 1.6s, load SD checkpoint: 1.5s, create ui: 10.1s, gradio launch: 0.2s).
ModelScope text2video extension for auto1111 webui
Git commit: ab1c4e74 (Mon Mar 20 22:22:46 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Starting text2video
False

DECODING FRAMES
torch.Size([24, 4, 32, 32])
STARTING VAE ON CPU
Exception occured
提示：Python 运行时抛出了一个异常。请检查疑难解答页面。
 Error while processing rearrange-reduction pattern "b c f h w -> (b f) c h w".
 Input tensor shape: torch.Size([1, 4, 32, 32]). Additional info: {}.
 Expected 5 dimensions, got 4

Additional information

No response

[Feature Request]: Model switcher

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Now that an updated model is out, it'd be nice to be able to manage multiple models without having to overwrite the existing one.

Proposed workflow

In the extension tab, choose the model in the dropdown from available models.

Additional information

No response

[Feature Request]: generate button should grayout when it generates video

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

simple visual feedback that its working

Proposed workflow

Go to ....
Press ....
...

Additional information

No response

[Feature Request]: img2vid

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

take an existing image, and make a video that starts with it or incorporates it in some frame

Proposed workflow

Go to ....
Press ....
...

Additional information

We already have a lot of methods for vid2vid, including using img2img on each image in the video, or pix2pix, or even scripts that do it. We don't have an img2vid yet though.
Not sure if this is already in the works.

[Feature Request]: xformers support

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Add support for xformers memory effecient attention

Proposed workflow

This is something that I am working on right now. Still ironing out some bugs, but once I get it working I will make a PR

Additional information

No response

[Bug]: "[Errno 2] No such file or directory: 'C:\\REDACTED\\stable-diffusion-webui\\models/ModelScope/t2v/configuration.json'"

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

I used Install from URL, then closed and re-lauched the WebUI. I download the additional models, but I had to create the directory they are supposed to go into. I have attempted to update the extension several times but it says it is already up to date. I have searched everywhere and cannot find any results for this error message.

Steps to reproduce the problem

Go to extention's tab
Type any prompt
Press Generate
Error produced in console

ModelScope text2video extension for auto1111 webui
Git commit: eed8524 (Mon Mar 20 15:00:12 2023)
Starting text2video
Pipeline setup
Exception occured
[Errno 2] No such file or directory: 'C:\REDACTED\stable-diffusion-webui\models/ModelScope/t2v/configuration.json'

What should have happened?

Well... make a video, right?

WebUI and Deforum extension Commit IDs

webui commit id -
commit: [a9fed7c3] (AUTOMATIC1111/stable-diffusion-webui@a9fed7c)
txt2vid commit id -
sd-webui-modelscope-text2video | https://github.com/deforum-art/sd-webui-modelscope-text2video | eed8524 (Mon Mar 20 15:00:12 2023)

What GPU were you using for launching?

Torch active/reserved: 5781/6304 MiB, Sys VRAM: 8173/8192 MiB (99.77%)
Model hash: c633845498
Model: [!UmiAI][Ani]Macross_v2

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Console logs

i   Settings file found, loading
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 →   Updating Settings File
 ✓   SUCCESS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 i   Launcher Version 1.5.10
 i   Found a custom WebUI Config
 i   No Launcher launch options
 i   Python 3.10 found : [REDACTED]
 i   Python is in PATH
 i   Git found and already in PATH at C:\Program Files\Git\cmd\git.exe
 i   Automatic1111 SD WebUI found at C:\[REDACTED]\stable-diffusion-webui
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Opening A1111 WebUI Launcher
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 i   Arguments are now: --medvram --xformers --listen --port 6969 --api --opt-split-attention --enable-insecure-extension-access --gradio-auth [REDACTED] --no-half-vae --cloudflared
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Webui
 ✓   SUCCESS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: a1111-sd-webui-locon
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: a1111-sd-webui-tagcomplete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: a1111-stable-diffusion-webui-vram-estimator
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: gif2gif
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: novelai-2-local-prompt
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: openpose-editor
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: posex
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-dynamic-thresholding
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: SD-latent-mirroring
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-additional-networks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-ar
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-aspect-ratio-helper
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-controlnet
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-cutoff
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-depth-lib
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-llul
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-lora-block-weight
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-modelscope-text2video
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 4 (delta 3), reused 4 (delta 3), pack-reused 0
Unpacking objects: 100% (4/4), 550 bytes | 26.00 KiB/s, done.
From https://github.com/deforum-art/sd-webui-modelscope-text2video
   acf7842..92a7131  main           -> origin/main
 * [new branch]      version-getter -> origin/version-getter
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: sd-webui-tunnels
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: SDAtom-WebUi-client-queue-ext
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: Stable-Diffusion-Webui-Civitai-Helper
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: stable-diffusion-webui-composable-lora
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: stable-diffusion-webui-images-browser
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: stable-diffusion-webui-model-toolkit
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: stable-diffusion-webui-prompt-travel
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: stable-diffusion-webui-rembg
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: stable-diffusion-webui-state
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: stable-diffusion-webui-two-shot
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ↺   Updating Extension: UnivAICharGen
 ✓   SUCCESS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 i   Arguments are now: --medvram --xformers --listen --port 6969 --api --opt-split-attention --enable-insecure-extension-access --gradio-auth [REDACTED] --no-half-vae --cloudflared
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  WEBUI LAUNCHING VIA EMS LAUNCHER, EXIT THIS WINDOW TO STOP THE WEBUI
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 !   Any error happening after 'commit hash : XXXX' is not related to the launcher. Please report them on Automatic1111's github instead :
 ☁   https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/new/choose
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Cancel
venv "C:\[REDACTED]\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Commit hash: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2
Installing requirements for Web UI





Launching Web UI with arguments: --autolaunch --medvram --xformers --listen --port 6969 --api --opt-split-attention --enable-insecure-extension-access --gradio-auth [REDACTED] --no-half-vae --cloudflared
Civitai Helper: Get Custom Model Folder
Civitai Helper: Load setting from: C:\[REDACTED]\stable-diffusion-webui\extensions\Stable-Diffusion-Webui-Civitai-Helper\setting.json
Civitai Helper: No setting file, use default
Additional Network extension not installed, Only hijack built-in lora
LoCon Extension hijack built-in lora successfully
[AddNet] Updating model hashes...
100%|█████████████████████████████████████████████████████████████████████████████| 271/271 [00:00<00:00, 33970.60it/s]
[AddNet] Updating model hashes...
100%|█████████████████████████████████████████████████████████████████████████████| 271/271 [00:00<00:00, 38821.56it/s]
cloudflared detected, trying to connect...
 * Running on [REDACTED]
 * Traffic stats available on http://127.0.0.1:54868/metrics
Loading weights [c633845498] from C:\PANDORA\stable-diffusion-webui\models\Stable-diffusion\[!UmiAI][Ani]Macross_v2.safetensors
Creating model from config: C:\PANDORA\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: C:\[REDACTED]\stable-diffusion-webui\models\VAE\SD-default-vae.safetensors
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(924): [REDACTED]
Model loaded in 3.3s (create model: 0.4s, apply weights to model: 0.6s, apply half(): 0.6s, load VAE: 0.1s, load textual inversion embeddings: 1.4s).
Running on local URL:  [REDACTED]

To create a public link, set `share=True` in `launch()`.
Startup time: 37.6s (import gradio: 2.1s, import ldm: 1.1s, other imports: 2.2s, list extensions: 3.3s, setup codeformer: 0.1s, load scripts: 5.9s, load SD checkpoint: 3.5s, create ui: 14.9s, gradio launch: 4.3s).
Consuming a byte in the end state
Consuming a byte in the end state
ModelScope text2video extension for auto1111 webui
Git commit: 92a71316 (Mon Mar 20 14:40:23 2023)
Starting text2video
Pipeline setup
Exception occured
[Errno 2] No such file or directory: 'C:\\[REDACTED]\\stable-diffusion-webui\\models/ModelScope/t2v/configuration.json'

ModelScope text2video extension for auto1111 webui
Git commit: 92a71316 (Mon Mar 20 14:40:23 2023)
Starting text2video
Pipeline setup
Exception occured
[Errno 2] No such file or directory: 'C:\\[REDACTED]\\stable-diffusion-webui\\models/ModelScope/t2v/configuration.json'
Closing server running on port: 6969
Restarting UI...
Civitai Helper: Get Custom Model Folder
Civitai Helper: Load setting from: C:\PANDORA\stable-diffusion-webui\extensions\Stable-Diffusion-Webui-Civitai-Helper\setting.json
Civitai Helper: No setting file, use default
Additional Network extension not installed, Only hijack built-in lora
LoCon Extension hijack built-in lora successfully
[AddNet] Updating model hashes...
100%|█████████████████████████████████████████████████████████████████████████████| 271/271 [00:00<00:00, 30191.68it/s]
cloudflared detected, trying to connect...
 * Running on [REDACTED]
 * Traffic stats available on http://127.0.0.1:54868/metrics
Running on local URL:  [REDACTED]

To create a public link, set `share=True` in `launch()`.
Startup time: 11.9s (list extensions: 3.5s, load scripts: 1.4s, create ui: 2.7s, gradio launch: 4.1s).
Consuming a byte in the end state
Consuming a byte in the end state
ModelScope text2video extension for auto1111 webui
Git commit: eed8524d (Mon Mar 20 15:00:12 2023)
Starting text2video
Pipeline setup
Exception occured
[Errno 2] No such file or directory: 'C:\\[REDACTED]\\stable-diffusion-webui\\models/ModelScope/t2v/configuration.json'

Additional information

I might be tech savvy but this stuff is still new to me. I have followed the directions presented to the best of my ability but am encountering an error about a missing file. I have redacted identifiable information that is not relevant to the issue at hand.

[Bug]: i run it on rtx3060ti 8g,it produce results but with black picture and videos

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

[Bug]: i run it on rtx3060ti 8g,it produce results but with black picture and videos

Steps to reproduce the problem

Go to ....
Press ....
...

What should have happened?

a panda eatting?

WebUI and Deforum extension Commit IDs

webui commit id - latest
txt2vid commit id -latest

What GPU were you using for launching?

rtx3060ti 8g

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

nothing

Console logs

don't know

Additional information

No response

[Feature Request]: Add option to store the model in RAM/VRAM between runs

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

1. Keep in VRAM
2. Keep in RAM

Model IO takes a bulk of runtime, so if the user has a plenty of VRAM, it would be nice to store the core model in it

Proposed workflow

Select store model in RAM in extension settings
Do runs without waiting for model loading/unloading from disc

Additional information

No response

[Bug]: Missing configuration.json

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

I just wanted to generate a video and I got this error. The folder did not exist in the requested location, but neither did the json.

[Errno 2] No such file or directory: 'H:\\Stable-Diffusion-Automatic\\stable-diffusion-webui\\models/ModelScope/t2v/configuration.json'

Steps to reproduce the problem

Generate a video.

What should have happened?

Make folder and json file by default.

WebUI and Deforum extension Commit IDs

webui commit id: a9fed7c3
txt2vid commit id: eed8524

What GPU were you using for launching?

NVIDIA RTX 3060 12 GB

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Console logs

ModelScope text2video extension for auto1111 webui
Git commit: eed8524d (Mon Mar 20 15:00:12 2023)
Starting text2video
Pipeline setup
Exception occured
[Errno 2] No such file or directory: 'H:\\Stable-Diffusion-Automatic\\stable-diffusion-webui\\models/ModelScope/t2v/configuration.json'

Additional information

No response

[Feature Request]: Add attention and VAE slicing

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

"Long Video Generation

You can optimize for memory usage by enabling attention and VAE slicing and using Torch 2.0. This should allow you to generate videos up to 25 seconds on less than 16GB of GPU VRAM."

https://twitter.com/Norod78/status/1638841615384031233

Proposed workflow

Go to vid2vid tab
Frames = 1024
run

Additional information

No response

TODO:

Add attention slicing https://github.com/huggingface/diffusers/blob/b94880e536d5e46acc374a5cebe49b442466d913/src/diffusers/models/controlnet.py#L372, but may be already in the core Auto as well, need to check out
Add VAE slicing https://github.com/huggingface/diffusers/blob/b94880e536d5e46acc374a5cebe49b442466d913/src/diffusers/models/autoencoder_kl.py#L186, need to check if it's already in Auto as well

error using text2vid with CPU

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Exception occurred: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I get this error every time I use the extension I get this error, can anyone help me?

Steps to reproduce the problem

open the web UI (with no Nvidia card - CPU)
go to text2vid tab
try to generete a video

What should have happened?

No response

WebUI and Deforum extension Commit IDs

webui commit id - 41d5450a0a559469b354fd56cac0fe1ab3cf2f40
txt2vid commit id - 6e2f2a8

What GPU were you using for launching?

CPU (i only have a AMD card), 12 GB of ram

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Console logs

C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml>git pull
Already up to date.
venv "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 41d5450a0a559469b354fd56cac0fe1ab3cf2f40
Installing requirements for Web UI
Installing None
Installing onnxruntime-gpu...
Installing None
Installing opencv-python...
Installing None
Installing Pillow...







Launching Web UI with arguments: --opt-sub-quad-attention --medvram --api --disable-safe-unpickle --disable-nan-check --lora-dir D:\Lora --ckpt-dir E:\Modelos IA --no-half --precision autocast --administrator
Interrogations are fallen back to cpu. This doesn't affect on image generation. But if you want to use interrogate (CLIP or DeepBooru), check out this issue: https://github.com/lshqqytiger/stable-diffusion-webui-directml/issues/10
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
No module 'xformers'. Proceeding without it.
2023-03-28 20:16:59.1400392 [E:onnxruntime:Default, provider_bridge_ort.cc:1304 onnxruntime::TryGetProviderInfo_CUDA] D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1106 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\venv\lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

2023-03-28 20:16:59.1471301 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:541 onnxruntime::python::CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
Hypernetwork-MonkeyPatch-Extension not found
[text2prompt] Following databases are available:
    all-mpnet-base-v2 : danbooru_strict
Loading weights [546d287d2f] from C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\models\Stable-diffusion\(ANIME - USE ESSE - incrivel - NSFW) fandermixPlus_v14.safetensors
Creating model from config: C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying sub-quadratic cross attention optimization.
Textual inversion embeddings loaded(32): ( plano de fundo de anime) anime-background-style-v2, (adiciona chamas em volta do personagem) flame_surge_style, (adiciona globins) Style-Goblinmode, (adiciona tentaculos eroticos) corneo_tentacle_sex, (adiciona uma bomba nuclear) emb-nuke2, (baixos poligonos) poly-hd, (cria brasões com escudo) logo-with-face-on-shield, (deixa o gozo mais real - NSFW) realcumAI, (deixa tudo em estilo de desenhos antigos com linhas grossas) durer-style, (deixa tudo magico) style-sylvamagic, (estilho desenhado a mão e antigo - INCRIVEL )sd15_journalSketch, (estilo abstrato para historias misticas) fairy-tale-painting-style, (estilo animação abstrata) SamDoesArt1, (estilo de inverno) Style-Winter, (estilo manga preto e branco) rfktr_bwmnga, (estilo mistico e colorido) Style-NebMagic, (estilo morto - necromancia) Style-Necromancy, (estilo sombrio) tarot512, (fungos the last of us) tloustyle, (infograficos) Style-Info, (pintura chinesa tradicional) _shuimo_, (pixel art incrivelmente detalhada) art by EMB_skstest3, (torna os personagens em psicopatas ) Style-Psycho, (transforma animes em vampiros) vmpr, (transforma em anime POP - INCRIVEL) MakeItPopVA, (transforma todos em herois) hro, (transforma tudo em bosses) bsft, (tudo em estilo GTA5) gta5-artwork, (tudo em revistas porno antigas) pervpulp15, (util para ambientes de aventura) advntr, (util para artes conceituais) concept-art, (util para sexo anal) corneo_anal
Textual inversion embeddings skipped(14): (cria naves espaciais) DaveSpaceFour, (deixa tudo de forma que parece que foi desenhado a mão) InkPunk768, (deixa tudo em rascunho) UlukInkSketch2, (estilo apocalipse gore) Apocofy, (estilo de pintura) PaintStyle3, (estilo minjorney - abstrato- INCRIVEL)rzminjourney, (grandes construções futuristicas) Chadeisson, (livros antigos e artes sinistras)ParchArt, (muito bom para monstros e terror) ScaryMonstersV2, (transforma em pixel art) pixelart-1, (tudo dentro de vidraças) kc16-v1-4000, (tudo em estado apocaliptico) Apoc768, (tudo em um coração) HeartArt, (zumbies gore e terror) hellscape768
Model loaded in 8.7s (load weights from disk: 0.5s, create model: 0.6s, apply weights to model: 6.8s, load VAE: 0.2s, load textual inversion embeddings: 0.4s).
[text2prompt] Loading database with name "all-mpnet-base-v2 : danbooru_strict"...
[text2prompt] Database loaded
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 40.8s (import torch: 5.6s, import gradio: 0.9s, import ldm: 0.9s, other imports: 1.3s, list extensions: 3.6s, list SD models: 5.6s, setup codeformer: 0.1s, load scripts: 5.1s, load SD checkpoint: 9.0s, create ui: 8.4s, gradio launch: 0.2s, scripts app_started_callback: 0.1s).
ModelScope text2video extension for auto1111 webui
Git commit: 6e2f2a81 (Tue Mar 28 13:23:42 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
Traceback (most recent call last):
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 75, in process
    pipe = setup_pipeline()
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 30, in setup_pipeline
    return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v')
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\extensions\sd-webui-modelscope-text2video\scripts\t2v_pipeline.py", line 84, in __init__
    torch.load(
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\modules\safe.py", line 106, in load
    return load_with_extra(filename, extra_handler=global_extra_handler, *args, **kwargs)
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\modules\safe.py", line 151, in load_with_extra
    return unsafe_torch_load(filename, *args, **kwargs)
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\serialization.py", line 789, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\serialization.py", line 1131, in _load
    result = unpickler.load()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\pickle.py", line 1254, in load_binpersid
    self.append(self.persistent_load(pid))
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\serialization.py", line 1101, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\serialization.py", line 1083, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\serialization.py", line 215, in default_restore_location
    result = fn(storage, location)
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\serialization.py", line 182, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "C:\Users\Administrator\Desktop\SD\stable-diffusion-webui-directml\venv\lib\site-packages\torch\serialization.py", line 166, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Exception occurred: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Additional information

I'm using CPU mode, the only gpu I have is from AMD and it's not supported (as far as I know about text2vid).