Giter VIP home page Giter VIP logo

controlnet-v1-1-nightly's Introduction

ControlNet 1.1

This is the official release of ControlNet 1.1.

ControlNet 1.1 has the exactly same architecture with ControlNet 1.0.

We promise that we will not change the neural network architecture before ControlNet 1.5 (at least, and hopefully we will never change the network architecture). Perhaps this is the best news in ControlNet 1.1.

ControlNet 1.1 includes all previous models with improved robustness and result quality. Several new models are added.

Note that we are still working on updating this to A1111.

This repo will be merged to ControlNet after we make sure that everything is OK.

Note that we are actively editing this page now. The information in this page will be more detailed and finalized when ControlNet 1.1 is ready.

This Github Repo is NOT an A1111 Extension

Please do not copy the URL of this repo into your A1111.

If you want to use ControlNet 1.1 in A1111, you only need to install https://github.com/Mikubill/sd-webui-controlnet , and only follow the instructions in that page.

This project is for research use and academic experiments. Again, do NOT install "ControlNet-v1-1-nightly" into your A1111.

How to use ControlNet 1.1 in A1111?

The Beta Test for A1111 Is Started.

The A1111 plugin is: https://github.com/Mikubill/sd-webui-controlnet

Note that if you use A1111, you only need to follow the instructions in the above link. (You can ignore all installation steps in this page if you use A1111.)

For researchers who are not familiar with A1111: The A1111 plugin supports arbitrary combination of arbitrary number of ControlNets, arbitrary community models, arbitrary LoRAs, and arbitrary sampling methods! We should definitely try it!

Note that our official support for “Multi-ControlNet” is A1111-only. Please use Automatic1111 with Multi-ControlNet if you want to use multiple ControlNets at the same time. The ControlNet project perfectly supports combining multiple ControlNets, and all production-ready ControlNets are extensively tested with multiple ControlNets combined.

Model Specification

Starting from ControlNet 1.1, we begin to use the Standard ControlNet Naming Rules (SCNNRs) to name all models. We hope that this naming rule can improve the user experience.

img

ControlNet 1.1 include 14 models (11 production-ready models and 3 experimental models):

control_v11p_sd15_canny
control_v11p_sd15_mlsd
control_v11f1p_sd15_depth
control_v11p_sd15_normalbae
control_v11p_sd15_seg
control_v11p_sd15_inpaint
control_v11p_sd15_lineart
control_v11p_sd15s2_lineart_anime
control_v11p_sd15_openpose
control_v11p_sd15_scribble
control_v11p_sd15_softedge
control_v11e_sd15_shuffle
control_v11e_sd15_ip2p
control_v11f1e_sd15_tile

You can download all those models from our HuggingFace Model Page. All these models should be put in the folder "models".

You need to download Stable Diffusion 1.5 model "v1-5-pruned.ckpt" and put it in the folder "models".

Our python codes will automatically download other annotator models like HED and OpenPose. Nevertheless, if you want to manually download these, you can download all other annotator models from here. All these models should be put in folder "annotator/ckpts".

To install:

conda env create -f environment.yaml
conda activate control-v11

Note that if you use 8GB GPU, you need to set "save_memory = True" in "config.py".

ControlNet 1.1 Depth

Control Stable Diffusion with Depth Maps.

Model file: control_v11f1p_sd15_depth.pth

Config file: control_v11f1p_sd15_depth.yaml

Training data: Midas depth (resolution 256/384/512) + Leres Depth (resolution 256/384/512) + Zoe Depth (resolution 256/384/512). Multiple depth map generator at multiple resolution as data augmentation.

Acceptable Preprocessors: Depth_Midas, Depth_Leres, Depth_Zoe. This model is highly robust and can work on real depth map from rendering engines.

python gradio_depth.py

Non-cherry-picked batch test with random seed 12345 ("a handsome man"):

img

Update

2023/04/14: 72 hours ago we uploaded a wrong model "control_v11p_sd15_depth" by mistake. That model is an intermediate checkpoint during the training. That model is not converged and may cause distortion in results. We uploaded the correct depth model as "control_v11f1p_sd15_depth". The "f1" means bug fix 1. The incorrect model is removed. Sorry for the inconvenience.

Improvements in Depth 1.1:

  1. The training dataset of previous cnet 1.0 has several problems including (1) a small group of greyscale human images are duplicated thousands of times (!!), causing the previous model somewhat likely to generate grayscale human images; (2) some images has low quality, very blurry, or significant JPEG artifacts; (3) a small group of images has wrong paired prompts caused by a mistake in our data processing scripts. The new model fixed all problems of the training dataset and should be more reasonable in many cases.
  2. The new depth model is a relatively unbiased model. It is not trained with some specific type of depth by some specific depth estimation method. It is not over-fitted to one preprocessor. This means this model will work better with different depth estimation, different preprocessor resolutions, or even with real depth created by 3D engines.
  3. Some reasonable data augmentations are applied to training, like random left-right flipping.
  4. The model is resumed from depth 1.0, and it should work well in all cases where depth 1.0 works well. If not, please open an issue with image, and we will take a look at your case. Depth 1.1 works well in many failure cases of depth 1.0.
  5. If you use Midas depth (the "depth" in webui plugin) with 384 preprocessor resolution, the difference between depth 1.0 and 1.1 should be minimal. However, if you try other preprocessor resolutions or other preprocessors (like leres and zoe), the depth 1.1 is expected to be a bit better than 1.0.

ControlNet 1.1 Normal

Control Stable Diffusion with Normal Maps.

Model file: control_v11p_sd15_normalbae.pth

Config file: control_v11p_sd15_normalbae.yaml

Training data: Bae's normalmap estimation method.

Acceptable Preprocessors: Normal BAE. This model can accept normal maps from rendering engines as long as the normal map follows ScanNet's protocol. That is to say, the color of your normal map should look like the second column of this image.

Note that this method is much more reasonable than the normal-from-midas method in ControlNet 1.0. The previous method will be abandoned.

python gradio_normalbae.py

Non-cherry-picked batch test with random seed 12345 ("a man made of flowers"):

img

Non-cherry-picked batch test with random seed 12345 ("room"):

img

Improvements in Normal 1.1:

  1. The normal-from-midas method in Normal 1.0 is neither reasonable nor physically correct. That method does not work very well in many images. The normal 1.0 model cannot interpret real normal maps created by rendering engines.
  2. This Normal 1.1 is much more reasonable because the preprocessor is trained to estimate normal maps with a relatively correct protocol (NYU-V2's visualization method). This means the Normal 1.1 can interpret real normal maps from rendering engines as long as the colors are correct (blue is front, red is left, green is top).
  3. In our test, this model is robust and can achieve similar performance to the depth model. In previous CNET 1.0, the Normal 1.0 is not very frequently used. But this Normal 2.0 is much improved and has potential to be used much more frequently.

ControlNet 1.1 Canny

Control Stable Diffusion with Canny Maps.

Model file: control_v11p_sd15_canny.pth

Config file: control_v11p_sd15_canny.yaml

Training data: Canny with random thresholds.

Acceptable Preprocessors: Canny.

We fixed several problems in previous training datasets.

python gradio_canny.py

Non-cherry-picked batch test with random seed 12345 ("dog in a room"):

img

Improvements in Canny 1.1:

  1. The training dataset of previous cnet 1.0 has several problems including (1) a small group of greyscale human images are duplicated thousands of times (!!), causing the previous model somewhat likely to generate grayscale human images; (2) some images has low quality, very blurry, or significant JPEG artifacts; (3) a small group of images has wrong paired prompts caused by a mistake in our data processing scripts. The new model fixed all problems of the training dataset and should be more reasonable in many cases.
  2. Because the Canny model is one of the most important (perhaps the most frequently used) ControlNet, we used a fund to train it on a machine with 8 Nvidia A100 80G with batchsize 8×32=256 for 3 days, spending 72×30=2160 USD (8 A100 80G with 30 USD/hour). The model is resumed from Canny 1.0.
  3. Some reasonable data augmentations are applied to training, like random left-right flipping.
  4. Although it is difficult to evaluate a ControlNet, we find Canny 1.1 is a bit more robust and a bit higher visual quality than Canny 1.0.

ControlNet 1.1 MLSD

Control Stable Diffusion with M-LSD straight lines.

Model file: control_v11p_sd15_mlsd.pth

Config file: control_v11p_sd15_mlsd.yaml

Training data: M-LSD Lines.

Acceptable Preprocessors: MLSD.

We fixed several problems in previous training datasets. The model is resumed from ControlNet 1.0 and trained with 200 GPU hours of A100 80G.

python gradio_mlsd.py

Non-cherry-picked batch test with random seed 12345 ("room"):

img

Improvements in MLSD 1.1:

  1. The training dataset of previous cnet 1.0 has several problems including (1) a small group of greyscale human images are duplicated thousands of times (!!), causing the previous model somewhat likely to generate grayscale human images; (2) some images has low quality, very blurry, or significant JPEG artifacts; (3) a small group of images has wrong paired prompts caused by a mistake in our data processing scripts. The new model fixed all problems of the training dataset and should be more reasonable in many cases.
  2. We enlarged the training dataset by adding 300K more images by using MLSD to find images with more than 16 straight lines in it.
  3. Some reasonable data augmentations are applied to training, like random left-right flipping.
  4. Resumed from MLSD 1.0 with continued training with 200 GPU hours of A100 80G.

ControlNet 1.1 Scribble

Control Stable Diffusion with Scribbles.

Model file: control_v11p_sd15_scribble.pth

Config file: control_v11p_sd15_scribble.yaml

Training data: Synthesized scribbles.

Acceptable Preprocessors: Synthesized scribbles (Scribble_HED, Scribble_PIDI, etc.) or hand-drawn scribbles.

We fixed several problems in previous training datasets. The model is resumed from ControlNet 1.0 and trained with 200 GPU hours of A100 80G.

# To test synthesized scribbles
python gradio_scribble.py
# To test hand-drawn scribbles in an interactive demo
python gradio_interactive.py

Non-cherry-picked batch test with random seed 12345 ("man in library"):

img

Non-cherry-picked batch test with random seed 12345 (interactive, "the beautiful landscape"):

img

Improvements in Scribble 1.1:

  1. The training dataset of previous cnet 1.0 has several problems including (1) a small group of greyscale human images are duplicated thousands of times (!!), causing the previous model somewhat likely to generate grayscale human images; (2) some images has low quality, very blurry, or significant JPEG artifacts; (3) a small group of images has wrong paired prompts caused by a mistake in our data processing scripts. The new model fixed all problems of the training dataset and should be more reasonable in many cases.
  2. We find out that users sometimes like to draw very thick scribbles. Because of that, we used more aggressive random morphological transforms to synthesize scribbles. This model should work well even when the scribbles are relatively thick (the maximum width of training data is 24-pixel-width scribble in a 512 canvas, but it seems to work well even for a bit wider scribbles; the minimum width is 1 pixel).
  3. Resumed from Scribble 1.0, continued with 200 GPU hours of A100 80G.

ControlNet 1.1 Soft Edge

Control Stable Diffusion with Soft Edges.

Model file: control_v11p_sd15_softedge.pth

Config file: control_v11p_sd15_softedge.yaml

Training data: SoftEdge_PIDI, SoftEdge_PIDI_safe, SoftEdge_HED, SoftEdge_HED_safe.

Acceptable Preprocessors: SoftEdge_PIDI, SoftEdge_PIDI_safe, SoftEdge_HED, SoftEdge_HED_safe.

This model is significantly improved compared to previous model. All users should update as soon as possible.

New in ControlNet 1.1: now we added a new type of soft edge called "SoftEdge_safe". This is motivated by the fact that HED or PIDI tends to hide a corrupted greyscale version of the original image inside the soft estimation, and such hidden patterns can distract ControlNet, leading to bad results. The solution is to use a pre-processing to quantize the edge maps into several levels so that the hidden patterns can be completely removed. The implementation is in the 78-th line of annotator/util.py.

The perforamce can be roughly noted as:

Robustness: SoftEdge_PIDI_safe > SoftEdge_HED_safe >> SoftEdge_PIDI > SoftEdge_HED

Maximum result quality: SoftEdge_HED > SoftEdge_PIDI > SoftEdge_HED_safe > SoftEdge_PIDI_safe

Considering the trade-off, we recommend to use SoftEdge_PIDI by default. In most cases it works very well.

python gradio_softedge.py

Non-cherry-picked batch test with random seed 12345 ("a handsome man"):

img

Improvements in Soft Edge 1.1:

  1. Soft Edge 1.1 was called HED 1.0 in previous ControlNet.
  2. The training dataset of previous cnet 1.0 has several problems including (1) a small group of greyscale human images are duplicated thousands of times (!!), causing the previous model somewhat likely to generate grayscale human images; (2) some images has low quality, very blurry, or significant JPEG artifacts; (3) a small group of images has wrong paired prompts caused by a mistake in our data processing scripts. The new model fixed all problems of the training dataset and should be more reasonable in many cases.
  3. The Soft Edge 1.1 is significantly (in nealy 100% cases) better than HED 1.0. This is mainly because HED or PIDI estimator tend to hide a corrupted greyscale version of original image inside the soft edge map and the previous model HED 1.0 is over-fitted to restore that hidden corrupted image rather than perform boundary-aware diffusion. The training of Soft Edge 1.1 used 75% "safe" filtering to remove such hidden corrupted greyscale images insider control maps. This makes the Soft Edge 1.1 very robust. In out test, Soft Edge 1.1 is as usable as the depth model and has potential to be more frequently used.

ControlNet 1.1 Segmentation

Control Stable Diffusion with Semantic Segmentation.

Model file: control_v11p_sd15_seg.pth

Config file: control_v11p_sd15_seg.yaml

Training data: COCO + ADE20K.

Acceptable Preprocessors: Seg_OFADE20K (Oneformer ADE20K), Seg_OFCOCO (Oneformer COCO), Seg_UFADE20K (Uniformer ADE20K), or manually created masks.

Now the model can receive both type of ADE20K or COCO annotations. We find that recognizing the segmentation protocol is trivial for the ControlNet encoder and training the model of multiple segmentation protocols lead to better performance.

python gradio_seg.py

Non-cherry-picked batch test with random seed 12345 (ADE20k protocol, "house"):

img

Non-cherry-picked batch test with random seed 12345 (COCO protocol, "house"):

img

Improvements in Segmentation 1.1:

  1. COCO protocol is supported. The previous Segmentation 1.0 supports about 150 colors, but Segmentation 1.1 supports another 182 colors from coco.
  2. Resumed from Segmentation 1.0. All previous inputs should still work.

ControlNet 1.1 Openpose

Control Stable Diffusion with Openpose.

Model file: control_v11p_sd15_openpose.pth

Config file: control_v11p_sd15_openpose.yaml

The model is trained and can accept the following combinations:

  • Openpose body
  • Openpose hand
  • Openpose face
  • Openpose body + Openpose hand
  • Openpose body + Openpose face
  • Openpose hand + Openpose face
  • Openpose body + Openpose hand + Openpose face

However, providing all those combinations is too complicated. We recommend to provide the users with only two choices:

  • "Openpose" = Openpose body
  • "Openpose Full" = Openpose body + Openpose hand + Openpose face

You can try with the demo:

python gradio_openpose.py

Non-cherry-picked batch test with random seed 12345 ("man in suit"):

img

Non-cherry-picked batch test with random seed 12345 (multiple people in the wild, "handsome boys in the party"):

img

Improvements in Openpose 1.1:

  1. The improvement of this model is mainly based on our improved implementation of OpenPose. We carefully reviewed the difference between the pytorch OpenPose and CMU's c++ openpose. Now the processor should be more accurate, especially for hands. The improvement of processor leads to the improvement of Openpose 1.1.
  2. More inputs are supported (hand and face).
  3. The training dataset of previous cnet 1.0 has several problems including (1) a small group of greyscale human images are duplicated thousands of times (!!), causing the previous model somewhat likely to generate grayscale human images; (2) some images has low quality, very blurry, or significant JPEG artifacts; (3) a small group of images has wrong paired prompts caused by a mistake in our data processing scripts. The new model fixed all problems of the training dataset and should be more reasonable in many cases.

ControlNet 1.1 Lineart

Control Stable Diffusion with Linearts.

Model file: control_v11p_sd15_lineart.pth

Config file: control_v11p_sd15_lineart.yaml

This model is trained on awacke1/Image-to-Line-Drawings. The preprocessor can generate detailed or coarse linearts from images (Lineart and Lineart_Coarse). The model is trained with sufficient data augmentation and can receive manually drawn linearts.

python gradio_lineart.py

Non-cherry-picked batch test with random seed 12345 (detailed lineart extractor, "bag"):

img

Non-cherry-picked batch test with random seed 12345 (coarse lineart extractor, "Michael Jackson's concert"):

img

Non-cherry-picked batch test with random seed 12345 (use manually drawn linearts, "wolf"):

img

ControlNet 1.1 Anime Lineart

Control Stable Diffusion with Anime Linearts.

Model file: control_v11p_sd15s2_lineart_anime.pth

Config file: control_v11p_sd15s2_lineart_anime.yaml

Training data and implementation details: (description removed).

This model can take real anime line drawings or extracted line drawings as inputs.

Some important notice:

  1. You need a file "anything-v3-full.safetensors" to run the demo. We will not provide the file. Please find that file on the Internet on your own.
  2. This model is trained with 3x token length and clip skip 2.
  3. This is a long prompt model. Unless you use LoRAs, results are better with long prompts.
  4. This model does not support Guess Mode.

Demo:

python gradio_lineart_anime.py

Non-cherry-picked batch test with random seed 12345 ("1girl, in classroom, skirt, uniform, red hair, bag, green eyes"):

img

Non-cherry-picked batch test with random seed 12345 ("1girl, saber, at night, sword, green eyes, golden hair, stocking"):

img

Non-cherry-picked batch test with random seed 12345 (extracted line drawing, "1girl, Castle, silver hair, dress, Gemstone, cinematic lighting, mechanical hand, 4k, 8k, extremely detailed, Gothic, green eye"):

img

ControlNet 1.1 Shuffle

Control Stable Diffusion with Content Shuffle.

Model file: control_v11e_sd15_shuffle.pth

Config file: control_v11e_sd15_shuffle.yaml

Demo:

python gradio_shuffle.py

The model is trained to reorganize images. We use a random flow to shuffle the image and control Stable Diffusion to recompose the image.

Non-cherry-picked batch test with random seed 12345 ("hong kong"):

img

In the 6 images on the right, the left-top one is the "shuffled" image. All others are outputs.

In fact, since the ControlNet is trained to recompose images, we do not even need to shuffle the input - sometimes we can just use the original image as input.

In this way, this ControlNet can be guided by prompts or other ControlNets to change the image style.

Note that this method has nothing to do with CLIP vision or some other models.

This is a pure ControlNet.

Non-cherry-picked batch test with random seed 12345 ("iron man"):

img

Non-cherry-picked batch test with random seed 12345 ("spider man"):

img

Multi-ControlNets (A1111-only)

Source Image (not used):

Canny Image (Input):

Shuffle Image (Input):

Outputs:

image

(From: Mikubill/sd-webui-controlnet#736 (comment))

Important If You Implement Your Own Inference:

Note that this ControlNet requires to add a global average pooling " x = torch.mean(x, dim=(2, 3), keepdim=True) " between the ControlNet Encoder outputs and SD Unet layers. And the ControlNet must be put only on the conditional side of cfg scale. We recommend to use the "global_average_pooling" item in the yaml file to control such behaviors.

Note that this ControlNet Shuffle will be the one and only one image stylization method that we will maintain for the robustness in a long term support. We have tested other CLIP image encoder, Unclip, image tokenization, and image-based prompts but it seems that those methods do not work very well with user prompts or additional/multiple U-Net injections. See also the evidence here, here, and some other related issues. After some more recent researches/experiments, we plan to support more types of stylization methods in the future.

ControlNet 1.1 Instruct Pix2Pix

Control Stable Diffusion with Instruct Pix2Pix.

Model file: control_v11e_sd15_ip2p.pth

Config file: control_v11e_sd15_ip2p.yaml

Demo:

python gradio_ip2p.py

This is a controlnet trained on the Instruct Pix2Pix dataset.

Different from official Instruct Pix2Pix, this model is trained with 50% instruction prompts and 50% description prompts. For example, "a cute boy" is a description prompt, while "make the boy cute" is a instruction prompt.

Because this is a ControlNet, you do not need to trouble with original IP2P's double cfg tuning. And, this model can be applied to any base model.

Also, it seems that instructions like "make it into X" works better than "make Y into X".

Non-cherry-picked batch test with random seed 12345 ("make it on fire"):

img

Non-cherry-picked batch test with random seed 12345 ("make it winter"):

img

We mark this model as "experimental" because it sometimes needs cherry-picking. For example, here is non-cherry-picked batch test with random seed 12345 ("make he iron man"):

img

ControlNet 1.1 Inpaint

Control Stable Diffusion with Inpaint.

Model file: control_v11p_sd15_inpaint.pth

Config file: control_v11p_sd15_inpaint.yaml

Demo:

python gradio_inpaint.py

Some notices:

  1. This inpainting ControlNet is trained with 50% random masks and 50% random optical flow occlusion masks. This means the model can not only support the inpainting application but also work on video optical flow warping. Perhaps we will provide some example in the future (depending on our workloads).
  2. We updated the gradio (2023/5/11) so that the standalone gradio codes in main ControlNet repo also do not change unmasked areas. Automatic 1111 users are not influenced.

Non-cherry-picked batch test with random seed 12345 ("a handsome man"):

img

See also the Guidelines for Using ControlNet Inpaint in Automatic 1111.

ControlNet 1.1 Tile

Update 2023 April 25: The previously unfinished tile model is finished now. The new name is "control_v11f1e_sd15_tile". The "f1e" means 1st bug fix ("f1"), experimental ("e"). The previous "control_v11u_sd15_tile" is removed. Please update if your model name is "v11u".

Control Stable Diffusion with Tiles.

Model file: control_v11f1e_sd15_tile.pth

Config file: control_v11f1e_sd15_tile.yaml

Demo:

python gradio_tile.py

The model can be used in many ways. Overall, the model has two behaviors:

  • Ignore the details in an image and generate new details.
  • Ignore global prompts if local tile semantics and prompts mismatch, and guide diffusion with local context.

Because the model can generate new details and ignore existing image details, we can use this model to remove bad details and add refined details. For example, remove blurring caused by image resizing.

Below is an example of 8x super resolution. This is a 64x64 dog image.

p

Non-cherry-picked batch test with random seed 12345 ("dog on grassland"):

img

Note that this model is not a super resolution model. It ignores the details in an image and generate new details. This means you can use it to fix bad details in an image.

For example, below is a dog image corrupted by Real-ESRGAN. This is a typical example that sometimes super resolution methds fail to upscale images when source context is too small.

p

Non-cherry-picked batch test with random seed 12345 ("dog on grassland"):

img

If your image already have good details, you can still use this model to replace image details. Note that Stable Diffusion's I2I can achieve similar effects but this model make it much easier for you to maintain the overall structure and only change details even with denoising strength 1.0 .

Non-cherry-picked batch test with random seed 12345 ("Silver Armor"):

img

More and more people begin to think about different methods to diffuse at tiles so that images can be very big (at 4k or 8k).

The problem is that, in Stable Diffusion, your prompts will always influent each tile.

For example, if your prompts are "a beautiful girl" and you split an image into 4×4=16 blocks and do diffusion in each block, then you are will get 16 "beautiful girls" rather than "a beautiful girl". This is a well-known problem.

Right now people's solution is to use some meaningless prompts like "clear, clear, super clear" to diffuse blocks. But you can expect that the results will be bad if the denonising strength is high. And because the prompts are bad, the contents are pretty random.

ControlNet Tile can solve this problem. For a given tile, it recognizes what is inside the tile and increase the influence of that recognized semantics, and it also decreases the influence of global prompts if contents do not match.

Non-cherry-picked batch test with random seed 12345 ("a handsome man"):

img

You can see that the prompt is "a handsome man" but the model does not paint "a handsome man" on that tree leaves. Instead, it recognizes the tree leaves paint accordingly.

In this way, ControlNet is able to change the behavior of any Stable Diffusion model to perform diffusion in tiles.

Gallery of ControlNet Tile

Note: Our official support for tiled image upscaling is A1111-only. The gradio example in this repo does not include tiled upscaling scripts. Please use the A1111 extension to perform tiled upscaling (with other tiling scripts like Ultimate SD Upscale or Tiled Diffusion/VAE).

From Mikubill/sd-webui-controlnet#1142 (comment)

(Output, Click image to see full resolution)

grannie-comp

(Zooming-in of outputs)

grannie-Comp_face

grannie-Comp_torso

grannie-Comp_torso2

From Mikubill/sd-webui-controlnet#1142 (comment)

(Input)

image

(Output, Click image to see full resolution) image

From: #50 (comment)

(Input)

image

(Output, Click image to see full resolution, note that this example is extremely challenging)

image

From Mikubill/sd-webui-controlnet#1142 (comment):

(before)

2600914554720735184649534855329348215514636378-166329422

(after, Click image to see full resolution) 2600914554720735184649534855329348215514636383-1549088886

Comparison to Midjourney V5/V5.1 coming soon.

Annotate Your Own Data

We provide simple python scripts to process images.

See a gradio example here.

controlnet-v1-1-nightly's People

Contributors

lllyasviel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

controlnet-v1-1-nightly's Issues

Segmentation annotation

Is there a reference (paper, dataset, or anything) describing which color represents which concept? It is very useful to build masks manually for ControlNet.

How to pass the mask for inpainting via diffusers library

Hi,
The gradio script for inpainting controlnet set -1 for the masked areas.

How can we do this via diffusers library that takes a PIL image? I tried creating the PIL image of float32 type, but it does not work for RGB or RGBA images.

Style2Paints models?

Just curious -- now that there is an anime lineart cnet, was there plans to eventually make/release one of the scribble lineart models or color scribble model previously referenced in Style2Paints?

https://github.com/lllyasviel/style2paints/tree/master/V5_preview#style2paints-v5-alice
https://github.com/lllyasviel/style2paints/tree/master/V5_preview#supporting-color-scribbles

Or, would your suggestion be: to replicate the lineart scribble, use the existing scribble model or the anime lineart model at a low strength, and then for color scribbles, use the anime lineart model in combination with img2img (perhaps that's essentially what it was already doing)?

ImportError: cannot import name 'safe_step' from 'annotator.util'

Trying out the new ControlNet 1.1 softedge, and I get this error on all the softedge preprocessors. Can't import safe_step?

Loading model from cache: control_v11p_sd15_softedge [a8575a2a]0:00,  1.30it/s]
Loading preprocessor: pidinet_safe
Error running process: F:\repos\auto111\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\controlnet.py
Traceback (most recent call last):
  File "F:\repos\auto111\stable-diffusion-webui\modules\scripts.py", line 417, in process
    script.process(p, *script_args)
  File "F:\repos\auto111\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\controlnet.py", line 870, in process
    detected_map, is_image = preprocessor(input_image, res=unit.processor_res, thr_a=unit.threshold_a, thr_b=unit.threshold_b)
  File "F:\repos\auto111\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\processor.py", line 281, in pidinet_safe
    from annotator.pidinet import apply_pidinet
  File "F:\repos\auto111\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator\pidinet\__init__.py", line 6, in <module>
    from annotator.util import safe_step
ImportError: cannot import name 'safe_step' from 'annotator.util' (F:\repos\auto111\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator\util.py)

I want to choose line_art_anime model other than anything_v3_full.safetensors.

Thank you for creating such a wonderful work.
I would like to make a request to choose a different model for "line_art_anime model" instead of anything_v3_full.safetensors.
Of course, I understand that I can fork the gradio_lineart_anime.py file and modify the following line of code:

27 model.load_state_dict(load_state_dict('./models/anything-v3-full.safetensors', location='cuda'), strict=False)

to that for waifu or other anime model.

However, Anything-v3 model includes the NovelAI Leak, which may make it unusable in the future.
Therefore, if it is possible to select a model for line art anime from ./stable-diffusion-webui/models/Stable-diffusion or so on, and check it in the Automatic1111 WebUI Settings, it would be very helpful.
I'm sorry for my skilllessness and poor English.

Inpainting Model adds green tint with each iteration

Every pass with the inpainting model causes the image to become more desaturated and gain an increasingly green tint. It's a shame because it works really well otherwise. I hope some kind of update can fix this.

Training details for inpaint

Thank you for your nice work and new contribution!

I don't know if you plan to release a new version of your paper but meanwhile I would have some question about the training procedure and details for the inpaint model.

My main question is about the "random optical flow occlusion masks". Could we have more details about it? It is a mask where the optical flow is higher than a threshold between 2 video frames?

Also, all the training details possible would be appreciate mostly number of training steps, batch size and datasets used?

what is the PIDI estimator?

Dear Lvmin,

Thank you for sharing these great models.

Could you tell me what is the PIDI estimator in the scribble model? Could you provide me the reference?

Thank you for your help.

Best Wishes,

Zongze

Going to install from URL in extensions in Automatic 1111 and get this error

The link in question: https://github.com/lllyasviel/ControlNet-v1-1-nightly
I tried typing git restore --source=HEAD :/ into a command line but nothing happened. Also deleted folders and tried again, no luck.

GitCommandError: Cmd('git') failed due to: exit code(128) cmdline: git clone -v -- https://github.com/lllyasviel/ControlNet-v1-1-nightly E:\AI\stable-diffusion-webui\stable-diffusion-webui\tmp\ControlNet-v1-1-nightly stderr: 'Cloning into 'E:\AI\stable-diffusion-webui\stable-diffusion-webui\tmp\ControlNet-v1-1-nightly'... POST git-upload-pack (185 bytes) POST git-upload-pack (227 bytes) Updating files: 72% (727/1009) Updating files: 73% (737/1009) Updating files: 74% (747/1009) error: unable to create file annotator/zoe/zoedepth/models/base_models/midas_repo/mobile/android/lib_support/src/main/java/org/tensorflow/lite/examples/classification/tflite/ClassifierQuantizedEfficientNet.java: Filename too long Updating files: 75% (757/1009) error: unable to create file annotator/zoe/zoedepth/models/base_models/midas_repo/mobile/android/lib_task_api/src/main/java/org/tensorflow/lite/examples/classification/tflite/ClassifierQuantizedEfficientNet.java: Filename too long Updating files: 76% (767/1009) Updating files: 77% (777/1009) Updating files: 78% (788/1009) Updating files: 79% (798/1009) Updating files: 80% (808/1009) Updating files: 81% (818/1009) Updating files: 82% (828/1009) Updating files: 83% (838/1009) Updating files: 84% (848/1009) Updating files: 85% (858/1009) Updating files: 86% (868/1009) Updating files: 87% (878/1009) Updating files: 88% (888/1009) Updating files: 89% (899/1009) Updating files: 90% (909/1009) Updating files: 91% (919/1009) Updating files: 92% (929/1009) Updating files: 93% (939/1009) Updating files: 94% (949/1009) Updating files: 95% (959/1009) Updating files: 96% (969/1009) Updating files: 97% (979/1009) Updating files: 98% (989/1009) Updating files: 99% (999/1009) Updating files: 100% (1009/1009) Updating files: 100% (1009/1009), done. fatal: unable to checkout working tree warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry with 'git restore --source=HEAD :/' '

discord?

Would be great to have a discord server so we can change ideas about Controlnet

What are the differences and advantages between the v1.1 and v1.0?

Hi @lllyasviel,

I noticed that there are mainly two differences between v1.1 and v1.0:

  1. ControlNet v1.1 includes all previous models with improved robustness and result quality. Several new models are added.
  2. Standard ControlNet Naming Rules

What modifications did you make to get better results in v1.1? Did you re-train the model with better data or there are other important changes?

LoRA + ControlNet?

Hey!
Do you think it's possible to adapt ControlNet onto a LoRA network?
I think it could be really powerful for real-world tasks where a very specific output is desired. So potentially anyone could take a LoRA model from https://civitai.com and append a ControlNet to it.

Let me know if you have any pointers and I'll take it from there.

Thanks!

Model license - same as ControlNet 1.0?

Hi,

I want to put some of the 1.1 models converted into Diffusers format on Hugging Face. Is the license for ControlNet 1.1 models the same as for 1.0 (The CreativeML OpenRAIL M license)?

ControlNet 1.1 Instruct Pix2Pix produces bad results

Dear Lvmin,

Thank you for sharing these good models.

I try the ControlNet 1.1 Instruct Pix2Pix, but the results are not very good. I use the default parameters.

image

image

image

Any thing I can do to make the results look better?

Thank you for your help.

Best Wishes,

Zongze

Joint training of ControlNets with different conditions

Hey! Thanks for all the amazing work on this project @lllyasviel.

We at Virtual Staging AI are experimenting with modifying ControlNet for virtual staging (adding furniture to empty rooms). Here's an example input/output pair using a 3D reconstruction model.
Screenshot 2023-04-02 at 13 13 00

With ControlNet we've achieved fairly good results by superimposing M-LSD lines on top of the original room image
Obviously you can get powerful results by combining multiple conditions as described in the ControlNet article on Hugging Face but does it also make sense to jointly train ControlNets with different conditions?

For example, train with two conditions as input:

  • RGB and
  • depth map

If so, is there code to train multiple ControlNets jointly? Or one ControlNet on multiple conditions?

Blog, discussion, or paper of ControlNet1.1

Hello, I wanted to express my gratitude for the incredible work you and your team have done on ControlNet1.1. It's truly fantastic and awe-inspiring.

I was wondering if there's any further information available on the training and implementation details of this update (especially the amazing 'tile' mode). Would there be a blog, discussion, or paper where I could learn more about it? I believe this would greatly benefit the research community and advance the progress in this field.

Thank you once again for your hard work and dedication to advancing the state-of-the-art.

How to finetune ?

Hi everyone, thank you @lllyasviel for this wonderful tool you made !

I've been trying to finetune one of the latest V1.1 model : openpose.

I managed to modify the tutorial_train.py and tutorial_dataset.py from the original repo to make the training start but it doesn't seem to be following the openpose at all in the current early stages of training (< 10k steps).

I've tried to hack the tool_add_control and tool_transfer_control scripts to create a model with the latest of V1.1 ControlNet Openpose weights that I could feed to tutorial_train but it seems it's restarting from scratch anyway.

Can you provide some guidelines or feedback on how to finetune an existing model please ?

Thanks

unable to load openpose

i updated my controlnet yesterday and its very good to use, thank you!
but everytime i select openpose(face/faceonly/full) in preprocessor it wont work, show "AttributeError: 'NoneType' object has no attribute 'model'"
is that a bug or i operate incorrectlly?(other preprocessor works well, and three pth files were saved in "extensions\sd-webui-controlnet\annotator\downloads\openpose")

Shuffle model not working with deforum

Hi! I know using shuffle with deforum is probably a bad idea anyway but I sorta want to play and see if I can wrangle it into doing anything interesting. I get this error when I try and use it anyway (with no preprocessor. That works fine in regular img2img)

Error running process: D:\Users\Ben\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\controlnet.py
Traceback (most recent call last):
File "D:\Users\Ben\stable-diffusion-webui\modules\scripts.py", line 409, in process
script.process(p, *script_args)
File "D:\Users\Ben\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\controlnet.py", line 719, in process
model_net = self.load_control_model(p, unet, unit.model, unit.low_vram)
File "D:\Users\Ben\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\controlnet.py", line 502, in load_control_model
model_net = self.build_control_model(p, unet, model, lowvram)
File "D:\Users\Ben\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\controlnet.py", line 544, in build_control_model
assert os.path.exists(override_config), f'Error: The model config {override_config} is missing. ControlNet 1.1 must have configs.'
AssertionError: Error: The model config D:\Users\Ben\stable-diffusion-webui\models\ControlNet\controlnet11Models_shuffle.yaml is missing. ControlNet 1.1 must have configs.

Difference models

Hi, would it be possible to upload diff'd versions of control_v11p_sd15 models to HF? As a side concern, should we use a shared naming convention for difference models? I am unaware of how difference models are detected at the moment, so I don't know if it could help with user experience as well.

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

(control-v11) M:\ControlNet-v1-1-nightly>python gradio_lineart_anime.py
logging improved.
Enabled sliced_attention.
logging improved.
Enabled clip hacks.
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loaded model config from [./models/control_v11p_sd15s2_lineart_anime.yaml]
Loaded state_dict from [./models/anything-v3-full.safetensors]
Loaded state_dict from [./models/control_v11p_sd15s2_lineart_anime.pth]
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Global seed set to 12345
Traceback (most recent call last):
  File "F:\2\envs\control-v11\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "F:\2\envs\control-v11\lib\site-packages\gradio\blocks.py", line 1015, in process_api
    result = await self.call_function(
  File "F:\2\envs\control-v11\lib\site-packages\gradio\blocks.py", line 833, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "F:\2\envs\control-v11\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "F:\2\envs\control-v11\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "F:\2\envs\control-v11\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "gradio_lineart_anime.py", line 65, in process
    cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([prompt + ', ' + a_prompt] * num_samples)]}
  File "M:\ControlNet-v1-1-nightly\ldm\models\diffusion\ddpm.py", line 667, in get_learned_conditioning
    c = self.cond_stage_model.encode(c)
  File "M:\ControlNet-v1-1-nightly\ldm\modules\encoders\modules.py", line 131, in encode
    return self(text)
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "M:\ControlNet-v1-1-nightly\cldm\hack.py", line 65, in _hacked_clip_forward
    y = transformer_encode(feed)
  File "M:\ControlNet-v1-1-nightly\cldm\hack.py", line 42, in transformer_encode
    rt = self.transformer(input_ids=t, output_hidden_states=True)
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\2\envs\control-v11\lib\site-packages\transformers\models\clip\modeling_clip.py", line 722, in forward
    return self.text_model(
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\2\envs\control-v11\lib\site-packages\transformers\models\clip\modeling_clip.py", line 643, in forward
    encoder_outputs = self.encoder(
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\2\envs\control-v11\lib\site-packages\transformers\models\clip\modeling_clip.py", line 574, in forward
    layer_outputs = encoder_layer(
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\2\envs\control-v11\lib\site-packages\transformers\models\clip\modeling_clip.py", line 317, in forward
    hidden_states, attn_weights = self.self_attn(
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\2\envs\control-v11\lib\site-packages\transformers\models\clip\modeling_clip.py", line 209, in forward
    query_states = self.q_proj(hidden_states) * self.scale
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

Tile?

how to use TILE? it is super interesting but i dont get it, tryed everything

Softedge-hed appears to vertically warp images

when running large images, i noticed the softedge hed model very slightly warps large images, this causes problems when you're using it with any systems that combine 4 images into 1 image to be processed, then recombine as it makes the result jump around.

I have attached demonstration images, you can tell by overlaying them, this does not happen with openpose only i have observed and happens with pixel perfect both on and off.

input0

input0

About future most desired Condition Control

Hi, Thanks for your good works!
And about future most desired Condition Control, I have one suggestion and wish you could have a deployment about it in the future.
Now midjourney is a very famous tool and most people use it in design area. What attracts me the most is the way it uses reference images. It seems to use both the style and structure of the reference images, but the structure is not exactly the same as the reference images.
This method of using reference diagrams is very useful for design, so I hope to have a similar conditional control to achieve the same effect.

Inpaint + Multi control

Hi!

I saw that there is a control specifically to inpaint. Given that multi controlnet exists, how does inpaint work with multi-controlnet. Is there a possibility of using huggingface controlnet pipeline with inpaint, and pass mask ? Does this replace the need to use inpainting with controlnet?

SD21 models?

Pardon me if I missed something, but the example for the filename format has sd21 and sd21 768 as hints that they exist, but they're not in the models folder. Is it just the config that sets the difference up? I would think the model has to be actually trained for 2.1.

Update documentation on origin of lineart annotator?

awacke1/Image-to-Line-Drawings looks like a fork with minimal changes. The origin of the lineart annotator appears to be https://huggingface.co/spaces/carolineec/informativedrawings

Evidence:

Interestingly, there's a third model linked from the github page, which might be a good alternative to the current "softedge + blur + threshold" approach for estimating sketches

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

(control-v11) M:\ControlNet-v1-1-nightly>python gradio_lineart_anime.py
logging improved.
Enabled sliced_attention.
logging improved.
Enabled clip hacks.
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loaded model config from [./models/control_v11p_sd15s2_lineart_anime.yaml]
Loaded state_dict from [./models/anything-v3-full.safetensors]
Loaded state_dict from [./models/control_v11p_sd15s2_lineart_anime.pth]
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "F:\2\envs\control-v11\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "F:\2\envs\control-v11\lib\site-packages\gradio\blocks.py", line 1015, in process_api
    result = await self.call_function(
  File "F:\2\envs\control-v11\lib\site-packages\gradio\blocks.py", line 833, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "F:\2\envs\control-v11\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "F:\2\envs\control-v11\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "F:\2\envs\control-v11\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "gradio_lineart_anime.py", line 46, in process
    detected_map = preprocessor(resize_image(input_image, detect_resolution))
  File "M:\ControlNet-v1-1-nightly\annotator\lineart_anime\__init__.py", line 141, in __call__
    line = self.model(image_feed)[0, 0] * 127.5 + 127.5
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "M:\ControlNet-v1-1-nightly\annotator\lineart_anime\__init__.py", line 40, in forward
    return self.model(input)
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "M:\ControlNet-v1-1-nightly\annotator\lineart_anime\__init__.py", line 107, in forward
    return self.model(x)
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\container.py", line 139, in forward
    input = module(input)
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\conv.py", line 457, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "F:\2\envs\control-v11\lib\site-packages\torch\nn\modules\conv.py", line 453, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

【功能请求】希望ControlNet能增加「蒙版」的功能。

我的需求是这样的,我在文生图里面,已经生成好了角色的全裸姿态的图了。然后用CN的lineArtAnime控制好了主体画面的构图和要素了。接着下一步流程就是生成衣服了,我试着把CN里面的身体部分的控制线给擦除了,但是CN始终能把我擦除的部分给猜出来,只能画出很贴身体的服装出来,无法生成蓬松的衣服的样子。

所以我希望CN能出个蒙版功能,让SD在我指定的蒙版范围不受到到CN的控制,可以自由地画各种各样的衣服,来实现迭代的样子。

PS:如果是用inpaint的话,会多一次性能消耗,而且流程上会复杂很多的样子。
谢谢,伊莉雅大人。

specify GPU

I have 2 GPUs when I want to use cuda:1, I try to modify gradio_*.py , but throw error like this:
Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0 Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0

Is it possible to enhance the straight-line conditioning?

Original Title: Is it possible to enhance the straight-line conditioning?

Hello, thank you for the great work. CN + SD really changed the design field a lot.

I'm from both architecture and computer science background, and am currently investigating how far we can go in this direction for conceptual design phase.

There's one issue that we've tried to improve for a while, but cannot get through:

SD w/o CN
image

SD with CN
image

If you look at the image above, the mullions and window frames are not straight, the lines are wobbly.
We used a screenshot of a 3D model for the conditioning, but regardless of the preprocessor used, the generated images always have more or less issues like this.

What we thought about the cause might be:

  1. The preprocessed image has only 512 resolution, which makes the processed lines already wobbly (some lines are very light after processing)
  2. this is a short-comming of the SD itself.

We also tried to use volume screenshot without the mullions, but the results are similar:

SD with CN
image

Question:

At this point, we'd like to seek advice from the developers how this issue can be improved:

  1. Should we train a Diffusion model (dreambooth, or LoRA approach) with more architecture related model (we've tried a few from Civitai, but the improvements are limited)
  2. Should we train our own CN (for instance, a series of "non-perfect canny-style" images + perfect architecture rendering to have a CN understand those facade need to have straight mullions)?
  3. Or what should we do at this point?

Batch_size > 1 seems to break controlnet models

I'm getting the weirdest bug with ControlNet 1.1 (used to work on an earlier release, broken for a week or so). This is through the sdapi/v1/txt2img endpoint.

I'm trying to use 2 controlnet steps (webui is configured for 2). Seed is fixed at 12345.

I'm using the following configuration (excerpt from the logs):

Loading preprocessor: scribble_xdog
Pixel Perfect Mode Enabled.
resize_mode = ResizeMode.INNER_FIT
raw_H = 512
raw_W = 512
target_H = 512
target_W = 512
estimation = 512.0
preprocessor resolution = 512
Loading model from cache: t2iadapter_style_sd14v1 [202e85cc]
Loading preprocessor: clip_vision
Pixel Perfect Mode Enabled.
resize_mode = ResizeMode.INNER_FIT
raw_H = 512
raw_W = 512
target_H = 512
target_W = 512
estimation = 512.0
preprocessor resolution = 512
Data shape for DDIM sampling is (1, 4, 64, 64), eta 0.0

Things work okay-ish when I set the batch_size to 1. Here's an example:
image

If I keep the exact same parameters but change the batch_size to 4, images come out more and more distorted on each iteration. Here's the outputs in that case:

image

It's as if the batch_size is somehow re-feeding the outputs on the batch?

Setting "save_memory = True" in "config.py".

The main instruction page has this recommendation:

Note that if you use 8GB GPU, you need to set "save_memory = True" in "config.py".

Like at least one other person on Reddit, I have (as the owner of a 2070S with 8GB memory) no idea how to follow this instruction. There are multiple config.py files in the ControlNet folder alone, and dozens in AUTO1111. Are we supposed to create this file and put the flag in there? If so, saved where?

Is it possible that this instruction could be made much clearer on the main page?

Blue color blocks always generated with control_v11f1e_sd15_tile

Problem:
Blue color blocks always coming with the generated image when using control_v11f1e_sd15_tile (with or without preprocessor).
I'm not sure if it's caused by hardware because I'm using M1 Max 64G and running stable diffusion in CPU mode, I've seen others on Windows who don't have this issue.

Sample images generated with control_v11f1e_sd15_tile:

https://ax.minicg.com/images/cnet/1-0.png (original)
https://ax.minicg.com/images/cnet/1-1.png
https://ax.minicg.com/images/cnet/1-2.png
https://ax.minicg.com/images/cnet/1-3.png
https://ax.minicg.com/images/cnet/1-4.png

https://ax.minicg.com/images/cnet/2-0.png (original)
https://ax.minicg.com/images/cnet/2-1.png
https://ax.minicg.com/images/cnet/2-2.png
https://ax.minicg.com/images/cnet/2-3.png
https://ax.minicg.com/images/cnet/2-4.png
https://ax.minicg.com/images/cnet/2-5.png

https://ax.minicg.com/images/cnet/3-0.png (original)
https://ax.minicg.com/images/cnet/3-1.png
https://ax.minicg.com/images/cnet/3-2.png
https://ax.minicg.com/images/cnet/3-3.png
https://ax.minicg.com/images/cnet/3-4.png

https://ax.minicg.com/images/cnet/settings.png (Down Sampling Rate from 1~8 tested)

Parameters attached with PNG file, can be read using PNG INFO in sd-webui.

[WIP] Colorization + ControlNet

Hello, we have been trying (at neural.love) to train the colorization model based on the ControlNet architecture.

The model was trained on different LR's on the manually collected b&w-colorful image pares dataset.

Any recommendations regarding the training process are highly appreciated.

Current model problems:

  • adding colored spots to a homogeneous surface
  • coloring an object in a color that it cannot actually be

It is still heavily work-in-progress, but it could already (sometimes) colorize quite well, which is why we decided to share it. I have created a pull request here, and here are some results of the first version:
01
02
03
04
05
06
07
08

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.