Giter VIP home page Giter VIP logo

facechain's Introduction



FaceChain

Breaking News

  • 🚀🚀🚀 We are launching [FACT] into the main branch, offering a 10-second impressive speed and seamless integration with standard ready-to-use LoRas and ControlNets, along with improved instruction-following capabilities ! The original train-based FaceChain is moved to (https://github.com/modelscope/facechain/tree/v3.0.0 ). (May 28th, 2024 UTC)
  • Our work FaceChain-ImagineID and FaceChain-SuDe got accepted to CVPR 2024 ! (February 27th, 2024 UTC)

Introduction

如果您熟悉中文,可以阅读中文版本的README

FaceChain is a novel framework for generating identity-preserved human portraits. In the newest FaceChain FACT (Face Adapter with deCoupled Training) version, with only 1 photo and 10 seconds, you can generate personal portraits in different settings (multiple styles now supported!). FaceChain has both high controllability and authenticity in portrait generation, including text-to-image and inpainting based pipelines, and is seamlessly compatible with ControlNet and LoRAs. You may generate portraits via FaceChain's Python scripts, or via the familiar Gradio interface, or via sd webui. FaceChain is powered by ModelScope.

ModelScope Studio 🤖  |API 🔥  | SD WebUI | HuggingFace Space 🤗 


YouTube

image

News

  • 🚀🚀🚀 We are launching [FACT], offering a 10-second impressive speed and seamless integration with standard ready-to-use LoRas and ControlNets, along with improved instruction-following capabilities ! (May 28th, 2024 UTC)
  • Our work FaceChain-ImagineID and FaceChain-SuDe got accepted to CVPR 2024 ! (February 27th, 2024 UTC)
  • 🏆🏆🏆Alibaba Annual Outstanding Open Source Project, Alibaba Annual Open Source Pioneer (Yang Liu, Baigui Sun). (January 20th, 2024 UTC)
  • Our work InfoBatch co-authored with NUS team got accepted to ICLR 2024(Oral)! (January 16th, 2024 UTC)
  • 🏆OpenAtom's 2023 Rapidly Growing Open Source Projects Award. (December 20th, 2023 UTC)
  • Add SDXL pipeline🔥🔥🔥, image detail is improved obviously. (November 22th, 2023 UTC)
  • Support super resolution🔥🔥🔥, provide multiple resolution choice (512512, 768768, 10241024, 20482048). (November 13th, 2023 UTC)
  • 🏆FaceChain has been selected in the BenchCouncil Open100 (2022-2023) annual ranking. (November 8th, 2023 UTC)
  • Add virtual try-on module. (October 27th, 2023 UTC)
  • Add wanx version online free app. (October 26th, 2023 UTC)
  • 🏆1024 Programmer's Day AIGC Application Tool Most Valuable Business Award. (2023-10-24, 2023 UTC)
  • Support FaceChain in stable-diffusion-webui🔥🔥🔥. (October 13th, 2023 UTC)
  • High performance inpainting for single & double person, Simplify User Interface. (September 09th, 2023 UTC)
  • More Technology Details can be seen in Paper. (August 30th, 2023 UTC)
  • Add validate & ensemble for Lora training, and InpaintTab(hide in gradio for now). (August 28th, 2023 UTC)
  • Add pose control module. (August 27th, 2023 UTC)
  • Add robust face lora training module, enhance the performance of one pic training & style-lora blending. (August 27th, 2023 UTC)
  • HuggingFace Space is available now! You can experience FaceChain directly with 🤗 (August 25th, 2023 UTC)
  • Add awesome prompts! Refer to: awesome-prompts-facechain (August 18th, 2023 UTC)
  • Support a series of new style models in a plug-and-play fashion. (August 16th, 2023 UTC)
  • Support customizable prompts. (August 16th, 2023 UTC)
  • Colab notebook is available now! You can experience FaceChain directly with Open In Colab. (August 15th, 2023 UTC)

To-Do List

  • Develop train-free methods, make it possible to run on cpu.
  • Develop RLHF methods, make its quality more higher.
  • Provide training scripts for new style lora.
  • Support more style lora (such as those on Civitai).
  • Support more beauty-retouch effects.
  • Provide more funny apps.

Citation

Please cite FaceChain in your publications if it helps your research

@article{liu2023facechain,
  title={FaceChain: A Playground for Identity-Preserving Portrait Generation},
  author={Liu, Yang and Yu, Cheng and Shang, Lei and Wu, Ziheng and 
          Wang, Xingjun and Zhao, Yuze and Zhu, Lin and Cheng, Chen and 
          Chen, Weitao and Xu, Chao and Xie, Haoyu and Yao, Yuan and 
          Zhou,  Wenmeng and Chen Yingda and Xie, Xuansong and Sun, Baigui},
  journal={arXiv preprint arXiv:2308.14256},
  year={2023}
}

Installation

Compatibility Verification

We have verified e2e execution on the following environment:

  • python: py3.8, py3.10
  • pytorch: torch2.0.0, torch2.0.1
  • CUDA: 11.7
  • CUDNN: 8+
  • OS: Ubuntu 20.04, CentOS 7.9
  • GPU: Nvidia-A10 24G

Installation Guide

The following installation methods are supported:

1. ModelScope notebook【recommended】

The ModelScope Notebook offers a free-tier that allows ModelScope user to run the FaceChain application with minimum setup, refer to ModelScope Notebook

# Step1: 我的notebook -> PAI-DSW -> GPU环境
# Note: Please use: ubuntu20.04-py38-torch2.0.1-tf1.15.5-modelscope1.8.1

# Step2: Entry the Notebook cell,clone FaceChain from github:
!GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git --depth 1

# Step3: Change the working directory to facechain, and install the dependencies:
import os
os.chdir('/mnt/workspace/facechain')    # You may change to your own path
print(os.getcwd())

!pip3 install gradio==3.47.1
!pip3 install controlnet_aux==0.0.6
!pip3 install python-slugify

# Step4: Start the app service, click "public URL" or "local URL", upload your images to 
# train your own model and then generate your digital twin.
!python3 app.py

Alternatively, you may also purchase a PAI-DSW instance (using A10 resource), with the option of ModelScope image to run FaceChain following similar steps.

2. Docker

If you are familiar with using docker, we recommend to use this way:

# Step1: Prepare the environment with GPU on local or cloud, we recommend to use Alibaba Cloud ECS, refer to: https://www.aliyun.com/product/ecs

# Step2: Download the docker image (for installing docker engine, refer to https://docs.docker.com/engine/install/)
# For China Mainland users:
docker pull registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.1
# For users outside China Mainland:
docker pull registry.us-west-1.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.1

# Step3: run the docker container
docker run -it --name facechain -p 7860:7860 --gpus all registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.1 /bin/bash
# Note: you may need to install the nvidia-container-runtime, follow the instructions:
# 1. Install nvidia-container-runtime:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
# 2. sudo systemctl restart docker

# Step4: Install the gradio in the docker container:
pip3 install gradio==3.47.1
pip3 install controlnet_aux==0.0.6
pip3 install python-slugify

# Step5 clone facechain from github
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git --depth 1
cd facechain
python3 app.py
# Note: FaceChain currently assume single-GPU, if your environment has multiple GPU, please use the following instead:
# CUDA_VISIBLE_DEVICES=0 python3 app.py

# Step6
Run the app server: click "public URL" --> in the form of: https://xxx.gradio.live

3. stable-diffusion-webui

  1. Select the Extensions Tab, then choose Install From URL (official plugin integration is intergrated, please install from URL currently). image

  2. Switch to Installed, check the FaceChain plugin, then click Apply and restart UI. It may take a while for installing the dependencies and downloading the models. Make sure that the "CUDA Toolkit" is installed correctly, otherwise the "mmcv" package cannot be successfully installed. image

  3. After the page refreshes, the appearance of the FaceChain Tab indicates a successful installation. image

Script Execution

FaceChain supports direct inference in the python environment. When inferring for Infinite Style Portrait generation, please edit the code in run_inference.py:

# Use pose control, default False
use_pose_model = False
# The path of the input image containing ID information for portrait generation
input_img_path = 'poses/man/pose2.png'
# The path of the image for pose control, only effective when using pose control
pose_image = 'poses/man/pose1.png'
# The number of images to generate in inference
num_generate = 5
# The weight for the style model, see styles for detail
multiplier_style = 0.25
# Specify a folder to save the generated images, this parameter can be modified as needed
output_dir = './generated'
# The index of the chosen base model, see facechain/constants.py for detail
base_model_idx = 0
# The index of the style model, see styles for detail
style_idx = 0

Then execute:

python run_inference.py

You can find the generated personal digital image photos in the output_dir.

When inferring for Fixed Templates Portrait generation, please edit the code in run_inference_inpaint.py.

# Number of faces for the template image
num_faces = 1
# Index of face for inpainting, counting from left to right
selected_face = 1
# The strength for inpainting, you do not need to change the parameter
strength = 0.6
# The path of the template image
inpaint_img = 'poses/man/pose1.png'
# The path of the input image containing ID information for portrait generation
input_img_path = 'poses/man/pose2.png'
# The number of images to generate in inference
num_generate = 1
# Specify a folder to save the generated images, this parameter can be modified as needed
output_dir = './generated_inpaint'

Then execute:

python run_inference_inpaint.py

You can find the generated personal digital image photos in the output_dir.

Algorithm Introduction

The capability of AI portraits generation comes from the large generative models like Stable Diffusion and its fine-tuning techniques. Due to the strong generalization capability of large models, it is possible to perform downstream tasks by fine-tuning on specific types of data and tasks, while preserving the model's overall ability of text following and image generation. The technical foundation of train-based and train-free AI portraits generation comes from applying different fine-tuning tasks to generative models. Currently, most existing AI portraits tools adopt a two-stage “train then generate” pipeline, where the fine-tuning task is “to generate portrait photos of a fixed character ID”, and the corresponding training data are multiple images of the fixed character ID. The effectiveness of such train-based pipeline depends on the scale of the training data, thus requiring certain image data support and training time, which also increases the cost for users.

Different from train-based pipeline, train-free pipeline adjusts the fine-tuning task to “generate portrait photos of a specified character ID”, meaning that the character ID image (face photo) is used as an additional input, and the output is a portrait photo preserving the input ID. Such a pipeline completely separates offline training from online inference, allowing users to generate portraits directly based on the fine-tuned model with only one photo in just 10 seconds, avoiding the cost for extensive data and training time. The fine-tuning task of train-free AI portraits generation is based on the adapter module. Face photos are processed through an image encoder with fixed weights and a parameter-efficient feature projection layer to obtain aligned features, and are then fed into the U-Net model of Stable Diffusion through attention mechanism similar as text conditions. At this point, face information as an independent branch condition is fed into the model alongside text information for inference, thereby enabling the generated images to maintain ID fidelity.

The basic algorithm based on face adapter is capable of achieving train-free AI portraits, but still requires certain adjustments to further improve its effectiveness. Existing train-free portrait tools generally suffer from the following issues: poor image quality of portraits, inadequate text following and style retention abilities in portraits, poor controllability and richness of portrait faces, and poor compatibility with extensions like ControlNet and style Lora. To address these issues, FaceChain attribute them to the fact that the fine-tuning tasks for existing train-free AI portrait tools have coupled with too much information beyond character IDs, and propose FaceChain Face Adapter with Decoupled Training (FaceChain FACT) to solve these problems. By fine-tuning the Stable Diffusion model on millions of portrait data, FaceChain FACT can achieve high-quality portrait image generation for specified character IDs. The entire framework of FaceChain FACT is shown in the figure below.

image

The decoupled training of FaceChain FACT consists of two parts: decoupling face from image, and decoupling ID from face. Existing methods often treat denoising portrait images as the fine-tuning task, which makes the model hard to accurately focus on the face area, thereby affecting the text-to-image ability of the base Stable Diffusion model. FaceChain FACT draws on the sequential processing and regional control advantages of face-swapping algorithms and implements the fine-tuning method for decoupling faces from images from both structural and training strategy aspects. Structurally, unlike existing methods that use a parallel cross-attention mechanism to process face and text information, FaceChain FACT adopts a sequential processing approach as an independent adapter layer inserted into the original Stable Diffusion's blocks. This way, face adaptation acts as an independent step similar to face-swapping during the denoising process, avoiding interference between face and text conditions. In terms of training strategy, besides the original MSE loss function, FaceChain FACT introduces the Face Adapting Incremental Regularization (FAIR) loss function, which controls the feature increment of the face adaptation step in the adapter layer to focus on the face region. During inference, users can flexibly adjust the generated effects by modifying the weight of the face adapter, balancing fidelity and generalization of the face while maintaining the text-to-image ability of Stable Diffusion. The FAIR loss function is formulated as follows:

image

Furthermore, addressing the issue of poor controllability and richness of generated faces, FaceChain FACT proposes a training method for decoupling ID from face, so that the portrait process only preserves the character ID rather than the entire face. Firstly, to better extract the ID information from the face while maintaining certain key facial details, and to better adapt to the structure of Stable Diffusion, FaceChain FACT employs a face feature extractor based on the Transformer architecture, which is pre-trained on a large-scale face dataset. All tokens from the penultimate layer are subsequently fed into a simple attention query model for feature projection, thereby ensuring that the extracted ID features meet the aforementioned requirements. Additionally, during the training process, FaceChain FACT uses the Classifier Free Guidance (CFG) method to perform random shuffle and drop for different portrait images of the same ID, thus ensuring that the input face images and the target images used for denoising may have different faces with the same ID, thus further preventing the model from overfitting to non-ID information of the face. As such, FaceChain FACT possesses high compatibility with the massive exquisite styles of FaceChain, which is shown as follows.

image

Model List

The models used in FaceChain:

[1] Face detection model DamoFD:https://modelscope.cn/models/damo/cv_ddsar_face-detection_iclr23-damofd

[2] Human parsing model M2FP:https://modelscope.cn/models/damo/cv_resnet101_image-multiple-human-parsing

[3] Skin retouching model ABPN:https://www.modelscope.cn/models/damo/cv_unet_skin_retouching_torch

[4] Face fusion model:https://www.modelscope.cn/models/damo/cv_unet_face_fusion_torch

[5] FaceChain FACT model: https://www.modelscope.cn/models/yucheng1996/FaceChain-FACT

[6] Face attribute recognition model FairFace: https://modelscope.cn/models/damo/cv_resnet34_face-attribute-recognition_fairface

More Information

​ ModelScope Library provides the foundation for building the model-ecosystem of ModelScope, including the interface and implementation to integrate various models into ModelScope.

License

This project is licensed under the Apache License (Version 2.0).

facechain's People

Contributors

bubbliiiing avatar bwdforce avatar chkhu avatar cleaner-cyber avatar eltociear avatar foggy-whale avatar haoyu-xie avatar hehaha68 avatar hiswitch avatar hudcase avatar iiiiiiint avatar iotang avatar ly19965 avatar metrosir avatar mowunian avatar potazinc avatar prhloveayg avatar rentingxutx avatar slpal avatar sunbaigui avatar tastelikefeet avatar trumpool avatar ultimatech-cn avatar wangxingjun778 avatar wenmengzhou avatar wuziheng avatar wwdok avatar yingdachen avatar you-cun avatar zanghyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

facechain's Issues

/opt/conda/bin/python: can't open file 'facechain/train_text_to_image_lora.py': [Errno 2] No such file or directory

使用容器部署,GPU A10 NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0
webUI执行训练后后台日志输出如下错误,web现实训练完成但是形象体验Error

/opt/conda/bin/python: can't open file 'facechain/train_text_to_image_lora.py': [Errno 2] No such file or directory
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/launch.py", line 979, in launch_command
simple_launcher(args)
File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'facechain/train_text_to_image_lora.py', '--pretrained_model_name_or_path=ly261666/cv_portrait_model', '--revision=v2.0', '--sub_path=film/film', '--output_dataset_name=/tmp/qw/training_data/personalizaition_lora', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--num_train_epochs=200', '--checkpointing_steps=5000', '--learning_rate=1e-04', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--seed=42', '--output_dir=/tmp/qw/personalizaition_lora', '--lora_r=32', '--lora_alpha=32', '--lora_text_encoder_r=32', '--lora_text_encoder_alpha=32']' returned non-zero exit status 2.

出的效果图与本人相差甚远?

按照Readme提示,在notebook上部署体验了下,感受是:跟本人五官不说毫无关系吧,也是相差甚远,几乎看不出来是本人,可是样例中效果来看没有这种问题,请问需要调节哪些参数从而增加体验感?

安装mmcv==1.7.0之后运行出现问题

2023-08-21 16:42:31,201 - modelscope - INFO - Model revision not specified, use the latest revision: v1.1
2023-08-21 16:42:31,396 - modelscope - INFO - initiate model from /home/hx/.cache/modelscope/hub/damo/cv_ddsar_face-detection_iclr23-damofd
2023-08-21 16:42:31,396 - modelscope - INFO - initiate model from location /home/hx/.cache/modelscope/hub/damo/cv_ddsar_face-detection_iclr23-damofd.
2023-08-21 16:42:31,397 - modelscope - INFO - initialize model from /home/hx/.cache/modelscope/hub/damo/cv_ddsar_face-detection_iclr23-damofd
Traceback (most recent call last):
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/utils/registry.py", line 210, in build_from_cfg
return obj_cls._instantiate(**args)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/models/base/base_model.py", line 66, in _instantiate
return cls(**kwargs)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/models/cv/face_detection/scrfd/damofd_detect.py", line 31, in init
super().init(model_dir, **kwargs)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/models/cv/face_detection/scrfd/scrfd_detect.py", line 36, in init
from mmdet.models import build_detector
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/init.py", line 2, in
from .backbones import * # noqa: F401,F403
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/backbones/init.py", line 2, in
from .csp_darknet import CSPDarknet
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/backbones/csp_darknet.py", line 11, in
from ..utils import CSPLayer
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/utils/init.py", line 13, in
from .point_sample import (get_uncertain_point_coords_with_randomness,
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/utils/point_sample.py", line 3, in
from mmcv.ops import point_sample
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmcv/ops/init.py", line 2, in
from .active_rotated_filter import active_rotated_filter
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in
ext_module = ext_loader.load_ext(
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
ext = importlib.import_module('mmcv.' + name)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'mmcv._ext'

你好,我尝试过安装2.0.0版本的mmcv,但是安装后训练过程中出现报错,can not import name 'Config' from mmcv
所以后面换回1.7.0版本,但是出现了以上问题,请问该如何解决

关于cuda11.7

我在部署facechain的时候,系统要求cuda11.7,可是在ubuntu上的 4090对应的显卡驱动都是支持的cuda12.2的,安装不上11.7。请问是不是cuda12.2 的也可以,为什么我的这边一直报错,训练的时候。。

还有两个问题:
1: 如果conda安装的python3.10.6的时候,如果是cuda12.2,那么mim install mmcv-full==1.7.0压根就无法执行安装
2: 但是如果用python3.8的话是可以安装成功mim install mmcv-full==1.7.0,启动程序,但是训练的时候报错。

Error: nms_impl: implementation for device cuda:0 not found.

When upload picture and start trainning, there will be error on server side.
`2023-08-19 16:09:33,371 - modelscope - INFO - load model done
cathed for image process of 000.jpg
Error: nms_impl: implementation for device cuda:0 not found.

[]
Error: result is empty.
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "D:\dev\facechain\app.py", line 174, in run
data_process_fn(instance_data_dir, True)
File "D:\dev\facechain\facechain\inference.py", line 24, in data_process_fn
out_json_name = data_process_fn(input_img_dir)
File "D:\dev\facechain\facechain\data_process\preprocessing.py", line 335, in call
exit()
File "C:\ProgramData\anaconda3\envs\fchain\lib_sitebuiltins.py", line 26, in call
raise SystemExit(code)
SystemExit: None`

执行"开始推理"时报错 OSError: [Errno 122] Disk quota exceeded

按照 ModelScope notebook 的方式跑起来了, 提示模型训练成功后, 执行"开始推理"时报错 OSError: [Errno 122] Disk quota exceeded

运行环境: 魔搭平台免费实例, PAI-DSW, GPU环境

8核 32GB 显存16G
预装 ModelScope Library
预装镜像  ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.1

工作空间的空间大小如下

root@dsw:/mnt/workspace# du -h -d 1
14G     ./.cache
73K     ./.ipynb_checkpoints
8.5K    ./.virtual_documents
574K    ./facechain
14G     .

魔搭平台免费实例是不是提供的硬盘太小了? 看 facechain 官方 README 是要求 Disk: About 50GB

有没有什么办法能够在魔搭平台免费实例上成功体验完全流程呢?

mmcv和modelscope版本问题

目前看上去modelscop库里的很多task依然是基于mmcv<2.0.0编写的,要修改大量的地方,比如from mmcv.parallel import MMDataParallel ,后续会有更新吗

在colab跑起来出错

在colab上跑起来,用的是A100,到最后一步都正常,网页打开,上传照片后,点击开始训练,提示“CUDA is not available”
日志如下
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "/content/facechain/app.py", line 123, in run
raise gr.Error('CUDA is not available.')
gradio.exceptions.Error: 'CUDA is not available.'

Does it support run on local machine?

I try to train lora on my machine? but it raise error

In [1]: from modelscope import snapshot_download
^[[A2023-08-14 11:23:11,600 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2023-08-14 11:23:11,602 - modelscope - INFO - TensorFlow version 2.13.0 Found.
2023-08-14 11:23:11,602 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2023-08-14 11:23:11,631 - modelscope - INFO - Loading done! Current index file version is 1.8.1, with md5 bbb8dd73324c667bf9ab6594815ac903 and a total number of 893 components indexed

In [2]: model_dir = snapshot_download('Cherrytest/rot_bgr', revision='v1.0.0')
2023-08-14 11:23:13,696 - modelscope - ERROR - Authentication token does not exist, failed to access model Cherrytest/rot_bgr which may not exist or may be                 private. Please login first.

mat1 and mat2 must have the same dtype

08/17/2023 14:39:07 - INFO - __main__ - ***** Running training *****
08/17/2023 14:39:07 - INFO - __main__ -   Num examples = 3
08/17/2023 14:39:07 - INFO - __main__ -   Num Epochs = 200
08/17/2023 14:39:07 - INFO - __main__ -   Instantaneous batch size per device = 1
08/17/2023 14:39:07 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
08/17/2023 14:39:07 - INFO - __main__ -   Gradient Accumulation steps = 1
08/17/2023 14:39:07 - INFO - __main__ -   Total optimization steps = 600
Steps:   0%|                                                                                    | 0/600 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ facechain/facechain/train_text_to_image_lora.py:1103 in <module>           │
│                                                                                                  │
│   1100                                                                                           │
│   1101                                                                                           │
│   1102 if __name__ == "__main__":                                                                │
│ ❱ 1103 │   main()                                                                                │
│   1104                                                                                           │
│                                                                                                  │
│ facechain/facechain/train_text_to_image_lora.py:924 in main                │
│                                                                                                  │
│    921 │   │   │   │   │   raise ValueError(f"Unknown prediction type {noise_scheduler.config.p  │
│    922 │   │   │   │                                                                             │
│    923 │   │   │   │   # Predict the noise residual and compute loss                             │
│ ❱  924 │   │   │   │   model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sampl  │
│    925 │   │   │   │   loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")   │
│    926 │   │   │   │                                                                             │
│    927 │   │   │   │   # Gather the losses across all processes for logging (if we use distribu  │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in           │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:805 in │
│ forward                                                                                          │
│                                                                                                  │
│   802 │   │   # there might be better ways to encapsulate this.                                  │
│   803 │   │   t_emb = t_emb.to(dtype=sample.dtype)                                               │
│   804 │   │                                                                                      │
│ ❱ 805 │   │   emb = self.time_embedding(t_emb, timestep_cond)                                    │
│   806 │   │   aug_emb = None                                                                     │
│   807 │   │                                                                                      │
│   808 │   │   if self.class_embedding is not None:                                               │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in           │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/diffusers/models/embeddings.py:192 in        │
│ forward                                                                                          │
│                                                                                                  │
│   189 │   def forward(self, sample, condition=None):                                             │
│   190 │   │   if condition is not None:                                                          │
│   191 │   │   │   sample = sample + self.cond_proj(condition)                                    │
│ ❱ 192 │   │   sample = self.linear_1(sample)                                                     │
│   193 │   │                                                                                      │
│   194 │   │   if self.act is not None:                                                           │
│   195 │   │   │   sample = self.act(sample)                                                      │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in           │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/linear.py:114 in forward    │
│                                                                                                  │
│   111 │   │   │   init.uniform_(self.bias, -bound, bound)                                        │
│   112 │                                                                                          │
│   113 │   def forward(self, input: Tensor) -> Tensor:                                            │
│ ❱ 114 │   │   return F.linear(input, self.weight, self.bias)                                     │
│   115 │                                                                                          │
│   116 │   def extra_repr(self) -> str:                                                           │
│   117 │   │   return 'in_features={}, out_features={}, bias={}'.format(                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: mat1 and mat2 must have the same dtype
Steps:   0%|                                                                                    | 0/600 [00:02<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home//.local/bin/accelerate:8 in <module>                                              │
│                                                                                                  │
│   5 from accelerate.commands.accelerate_cli import main                                          │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(main())                                                                         │
│   9                                                                                              │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py:45 in  │
│ main                                                                                             │
│                                                                                                  │
│   42 │   │   exit(1)                                                                             │
│   43 │                                                                                           │
│   44 │   # Run                                                                                   │
│ ❱ 45 │   args.func(args)                                                                         │
│   46                                                                                             │
│   47                                                                                             │
│   48 if __name__ == "__main__":                                                                  │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/accelerate/commands/launch.py:941 in         │
│ launch_command                                                                                   │
│                                                                                                  │
│   938 │   elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA   │
│   939 │   │   sagemaker_launcher(defaults, args)                                                 │
│   940 │   else:                                                                                  │
│ ❱ 941 │   │   simple_launcher(args)                                                              │
│   942                                                                                            │
│   943                                                                                            │
│   944 def main():                                                                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/accelerate/commands/launch.py:603 in         │
│ simple_launcher                                                                                  │
│                                                                                                  │
│   600 │   process.wait()                                                                         │
│   601 │   if process.returncode != 0:                                                            │
│   602 │   │   if not args.quiet:                                                                 │
│ ❱ 603 │   │   │   raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)    │
│   604 │   │   else:                                                                              │
│   605 │   │   │   sys.exit(1)                                                                    │
│   606                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['miniconda3/bin/python', 'facechain/train_text_to_image_lora.py',
'--pretrained_model_name_or_path=ly261666/cv_portrait_model', '--revision=v2.0', '--sub_path=film/film',
'--dataset_name=./imgs', '--output_dataset_name=./processed', '--caption_column=text', '--resolution=512',
'--random_flip', '--train_batch_size=1', '--num_train_epochs=200', '--checkpointing_steps=5000',
'--learning_rate=1e-04', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--seed=42', '--output_dir=./output',
'--lora_r=32', '--lora_alpha=32', '--lora_text_encoder_r=32', '--lora_text_encoder_alpha=32']' returned non-zero exit
status 1.

有更多的服饰吗

目前只有几个屌爆了的silver armor之类的

examples = {
'prompt_male': [
['silver armor'],
['T-shirt']
],
'prompt_female': [
['beautiful traditional hanfu, upper_body'],
['an elegant evening gown']
],
}

example_styles = [
{'name': '默认风格(default style)'},
{'name': '凤冠霞帔(Chinese traditional gorgeous suit)',
'model_id': 'ly261666/civitai_xiapei_lora',
'revision': 'v1.0.0',
'bin_file': 'xiapei.safetensors',
'multiplier_style': 0.35,
'add_prompt_style': 'red, hanfu, tiara, crown, '},
]

k8s 上跑遇见的问题

Dockerfile

FROM registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.0
RUN pip3 install gradio

SHELL ["/bin/bash", "--login", "-c"]
RUN GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git --depth 1
WORKDIR facechain
ENV NVIDIA_DISABLE_REQUIRE=true

ENTRYPOINT ["python3","app.py"]

阿里云的 k8s调度的ecs

2023-08-20 09:08:15.817344: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
app.py:302: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
output_images = gr.Gallery(label='Output', show_label=False).style(columns=3, rows=2, height=600,

Error: result is empty.

[]
Error: result is empty.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "/content/facechain/app.py", line 149, in run
data_process_fn(instance_data_dir, True)
File "/content/facechain/facechain/inference.py", line 24, in data_process_fn
out_json_name = data_process_fn(input_img_dir)
File "/content/facechain/facechain/data_process/preprocessing.py", line 335, in call
exit()
File "/usr/lib/python3.10/_sitebuiltins.py", line 26, in call
raise SystemExit(code)
SystemExit: None

When running on Colab (t4 Runtime)

Error when training data

在windows11下运行app.py出现这个错误:'PYTHONPATH' 不是内部或外部命令,也不是可运行的程序

Error when training data

Windows环境下,训练报错,导致后面无法推理

 File "D:\ProgramData\anaconda3\envs\facechain\lib\site-packages\datasets\packaged_modules\folder_based_builder\folder_based_builder.py", line 311, in _generate_examples
    raise ValueError(
ValueError: image at tmp.png doesn't have metadata in D:\AI\qw\training_data\personalizaition_lora_labeled\metadata.jsonl.

查看后台相关信息,发现一个“rm”命令的错误信息,rm是linux命令,在windows下没有此命令。

2023-08-20 00:15:28.975118: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8700
000.jpg 0.9607361331582069
1girl, brown_eyes, brown_hair, earrings, jewelry, lips, long_hair, looking_at_viewer, open_mouth, simple_background, smile, solo, teeth, transparent_background
[['1girl', 'brown_eyes', 'brown_hair', 'earrings', 'jewelry', 'lips', 'long_hair', 'looking_at_viewer', 'open_mouth', 'simple_background', 'smile', 'solo', 'teeth', 'transparent_background']]
'rm' 不是内部或外部命令,也不是可运行的程序
或批处理文件。
0.png a beautiful woman, brown_hair, earrings, jewelry, long_hair, looking_at_viewer, open_mouth, simple_background, smile, solo, transparent_background
08/20/2023 00:15:31 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

No such file or directory: '/tmp/qw/personalizaition_lora/pytorch_lora_weights.bin'

File "/home/yyy/facechain/facechain/inference.py", line 47, in main_diffusion_inference
pipe = merge_lora(pipe, lora_human_path, multiplier_human, from_safetensor=False)
File "/home/yyy/facechain/facechain/merge_lora.py", line 15, in merge_lora
checkpoint = torch.load(os.path.join(lora_path, 'pytorch_lora_weights.bin'),
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/qw/personalizaition_lora/pytorch_lora_weights.bin
训练生成的为safetensor,为什么代码中的为bin?为什么会找不到该文件?请教,感谢。

Expected all tensors to be on the same device

Error: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

当有多张显卡的时候,会找不到指定的显卡,请问需要怎么解决?

CUDA is not available issue with colab

all works fine but when start training i get this error
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "/content/facechain/app.py", line 123, in run
raise gr.Error('CUDA is not available.')
gradio.exceptions.Error: 'CUDA is not available.'

torch2的 mmcv-full安装不了,一直卡在Building wheel for mmcv-full (setup.py) ... /

mim install mmcv-full==1.7.0
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Looking in links: https://download.openmmlab.com/mmcv/dist/cu117/torch2.0.0/index.html
Collecting mmcv-full==1.7.0
Downloading http://mirrors.aliyun.com/pypi/packages/a1/81/89120850923f4c8b49efba81af30160e7b1b305fdfa9671a661705a8abbf/mmcv-full-1.7.0.tar.gz (593 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 593.6/593.6 kB 4.6 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Requirement already satisfied: addict in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (2.4.0)
Requirement already satisfied: numpy in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (1.22.0)
Requirement already satisfied: packaging in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (23.1)
Requirement already satisfied: Pillow in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (10.0.0)
Requirement already satisfied: pyyaml in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (6.0.1)
Requirement already satisfied: yapf in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (0.40.1)
Requirement already satisfied: importlib-metadata>=6.6.0 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from yapf->mmcv-full==1.7.0) (6.8.0)
Requirement already satisfied: platformdirs>=3.5.1 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from yapf->mmcv-full==1.7.0) (3.10.0)
Requirement already satisfied: tomli>=2.0.1 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from yapf->mmcv-full==1.7.0) (2.0.1)
Requirement already satisfied: zipp>=0.5 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from importlib-metadata>=6.6.0->yapf->mmcv-full==1.7.0) (3.16.2)
Building wheels for collected packages: mmcv-full
Building wheel for mmcv-full (setup.py) ... /

Could not find a version that satisfies the requirement tf-estimator-nightly==2.8.0.dev2021122109

2.8.0的tensorflow好像有问题,python的3.8可以下载这个包吗?
INFO: pip is looking at multiple versions of tensorflow to determine which version is compatible with other requirements. This could take a while.
ERROR: Ignored the following versions that require a different python version: 1.11.0 Requires-Python <3.13,>=3.9; 1.11.0rc1 Requires-Python <3.13,>=3.9; 1.11.0rc2 Requires-Python <3.13,>=3.9; 1.11.1 Requires-Python <3.13,>=3.9; 1.11.2 Requires-Python <3.13,>=3.9; 1.25.0 Requires-Python >=3.9; 1.25.0rc1 Requires-Python >=3.9; 1.25.1 Requires-Python >=3.9; 1.25.2 Requires-Python >=3.9; 1.26.0b1 Requires-Python <3.13,>=3.9; 3.8.0rc1 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement tf-estimator-nightly==2.8.0.dev2021122109 (from tensorflow) (from versions: none)
ERROR: No matching distribution found for tf-estimator-nightly==2.8.0.dev2021122109

Can't download 3,2gb model

It's currently not possible to download the 3.20GB model.
The download fails at ~95%. This reproduceable on colab and local

Downloading:  92% 2.95G/3.20G [01:54<00:05, 46.1MB/s]


Downloading:  93% 2.97G/3.20G [01:54<00:05, 48.3MB/s]


Downloading:  93% 2.98G/3.20G [01:55<00:06, 34.6MB/s]


Downloading:  94% 3.00G/3.20G [01:55<00:06, 32.4MB/s]


Downloading:  94% 3.01G/3.20G [01:57<00:10, 19.7MB/s]


Downloading:  95% 3.04G/3.20G [01:58<00:07, 21.9MB/s]


Downloading:  95% 3.05G/3.20G [01:59<00:08, 20.7MB/s]Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 710, in _error_catcher
    yield
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 814, in _raw_read
    data = self._fp_read(amt) if not fp_closed else b""
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 799, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
  File "/usr/lib/python3.10/http/client.py", line 466, in read
    s = self.fp.read(amt)
  File "/usr/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 940, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 879, in read
    data = self._raw_read(amt)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 813, in _raw_read
    with self._error_catcher():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 727, in _error_catcher
    raise ProtocolError(f"Connection broken: {e!r}", e) from e
urllib3.exceptions.ProtocolError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
    response = f(*args, **kwargs)
  File "/content/facechain/app.py", line 184, in run
    data_process_fn(instance_data_dir, True)
  File "/content/facechain/facechain/inference.py", line 23, in data_process_fn
    data_process_fn = Blipv2()
  File "/content/facechain/facechain/data_process/preprocessing.py", line 202, in __init__
    self.model = DeepDanbooru()
  File "/content/facechain/facechain/data_process/deepbooru.py", line 721, in __init__
    snapshot_path = snapshot_download(foundation_model_id, revision='v4.0')
  File "/usr/local/lib/python3.10/dist-packages/modelscope/hub/snapshot_download.py", line 140, in snapshot_download
    parallel_download(
  File "/usr/local/lib/python3.10/dist-packages/modelscope/hub/file_download.py", line 243, in parallel_download
    list(executor.map(download_part, tasks))
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/modelscope/hub/file_download.py", line 203, in download_part
    for chunk in r.iter_content(chunk_size=API_FILE_DOWNLOAD_CHUNK_SIZE):
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 818, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

On windows you must use pip to install mmcv-full

When I use
mim install mmcv-full==1.7.0

I always got following error
RuntimeError: nms_impl: implementation for device cuda:0 not found.

I thought it was cuda version problem and try to downgrade my cuda from 12.2 to 11.8.

Finally when I use

mim uninstall mmcv-full
pip install mmcv-full

The building process will take twenty minutes. When I restart app, the error is gone.

CUDA out of memory error on training

When I run into the following error on Alibaba Cloud DSW with an NVIDIA V100 instance

image: modelscope:ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.1

DSW NVIDIA V100

08/18/2023 19:46:51 - INFO - __main__ - ***** Running training *****
08/18/2023 19:46:51 - INFO - __main__ -   Num examples = 9
08/18/2023 19:46:51 - INFO - __main__ -   Num Epochs = 200
08/18/2023 19:46:51 - INFO - __main__ -   Instantaneous batch size per device = 1
08/18/2023 19:46:51 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
08/18/2023 19:46:51 - INFO - __main__ -   Gradient Accumulation steps = 1
08/18/2023 19:46:51 - INFO - __main__ -   Total optimization steps = 1800
Steps:   0%|                                           | 0/1800 [00:00<?, ?it/s]Traceback (most recent call last):
  File "facechain/train_text_to_image_lora.py", line 1103, in <module>
    main()
  File "facechain/train_text_to_image_lora.py", line 924, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 956, in forward
    sample = upsample_block(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/unet_2d_blocks.py", line 2127, in forward
    hidden_states = attn(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/transformer_2d.py", line 291, in forward
    hidden_states = block(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention.py", line 154, in forward
    attn_output = self.attn1(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 321, in forward
    return self.processor(
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 601, in __call__
    attention_probs = attn.get_attention_scores(query, key, attention_mask)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 362, in get_attention_scores
    attention_scores = torch.baddbmm(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 15.78 GiB total capacity; 8.13 GiB already allocated; 469.75 MiB free; 8.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

torch.distributed.elastic.multiprocessing.errors.ChildFailedError

run the script PYTHONPATH=. sh train_lora.sh "ly261666/cv_portrait_model" "v2.0" "film/film" "./imgs" "./processed" "./output"

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 399410) of binary: /home/disk01/wyw/.conda/envs/facechain/bin/python
Traceback (most recent call last):
File "/home/disk01/wyw/.conda/envs/facechain/bin/accelerate", line 8, in
sys.exit(main())
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/accelerate/commands/launch.py", line 970, in launch_command
multi_gpu_launcher(args)
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/accelerate/commands/launch.py", line 646, in multi_gpu_launcher
distrib_run.run(args)
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

facechain/train_text_to_image_lora.py FAILED

使用windows11反复尝试修改已经可以跑起来,到形象体验->开始生成报错,实在没找到原因,还是请帮忙确认下,这个错误是否可以定位到问题?

windows11
python3.8
CUDA 11.7
GPU GeForce RTX 4060

与readme中环境的差异:
1、mmcv-full==1.7.0 报错,nms_impl: implementation for device cuda:0 not found. 反复卸载重装依旧报错,改用1.7.1后成功。
到形象体验->开始生成报错:
FileNotFoundError: [Errno 2] No such file or directory: 'D:\AI\facechain\tmp/qw/personalizaition_lora\pytorch_lora_weights.bin'

无法下载 人脸融合模型 cv_unet-image-face-fusion_damo

你好,能提供下载链接吗

报错如下:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.