Giter VIP home page Giter VIP logo

crm's Introduction

Convolutional Reconstruction Model

Official implementation for CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.

CRM is a feed-forward model which can generate 3D textured mesh in 10 seconds.

teaser.mp4

Try CRM 🍻

Install

Step 1 - Base

Install package one by one, we use python 3.9

pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
pip install torch-scatter==2.1.1 -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
pip install kaolin==0.14.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-1.13.1_cu117.html
pip install -r requirements.txt

besides, one by one need to install xformers manually according to the official doc (conda no need), e.g.

pip install ninja
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

Step 2 - Nvdiffrast

Install nvdiffrast according to the official doc, e.g.

pip install git+https://github.com/NVlabs/nvdiffrast

Inference

We suggest gradio for a visualized inference.

gradio app.py

image

For inference in command lines, simply run

CUDA_VISIBLE_DEVICES="0" python run.py --inputdir "examples/kunkun.webp"

It will output the preprocessed image, generated 6-view images and CCMs and a 3D model in obj format.

Tips: (1) If the result is unsatisfatory, please check whether the input image is correctly pre-processed into a grey background. Otherwise the results will be unpredictable. (2) Different from the Huggingface Demo, this official implementation uses UV texture instead of vertex color. It has better texture than the online demo but longer generating time owing to the UV texturing.

Todo List

  • Release inference code.
  • Release pretrained models.
  • Optimize inference code to fit in low memery GPU.
  • Upload training code.

Acknowledgement

Citation

@article{wang2024crm,
  title={CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model},
  author={Zhengyi Wang and Yikai Wang and Yifei Chen and Chendong Xiang and Shuo Chen and Dajiang Yu and Chongxuan Li and Hang Su and Jun Zhu},
  journal={arXiv preprint arXiv:2403.05034},
  year={2024}
}

crm's People

Contributors

if-ai avatar thuwzy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crm's Issues

How do you render an image from flexicube geometry?

Great work! Can you please share more details and ideally code of how do you render a render rgb image from the flexicubes geometry? I see the following code, but it only renders masks and depth. Thanks in advance!

    def render_mesh(self, mesh_v_nx3, mesh_f_fx3, camera_mv_bx4x4, resolution=256, hierarchical_mask=False):
        return_value = dict()
        if self.render_type == 'neural_render':
            tex_pos, mask, hard_mask, rast, v_pos_clip, mask_pyramid, depth = self.renderer.render_mesh(
                mesh_v_nx3.unsqueeze(dim=0),
                mesh_f_fx3.int(),
                camera_mv_bx4x4,
                mesh_v_nx3.unsqueeze(dim=0),
                resolution=resolution,
                device=self.device,
                hierarchical_mask=hierarchical_mask
            )

            return_value['tex_pos'] = tex_pos
            return_value['mask'] = mask
            return_value['hard_mask'] = hard_mask
            return_value['rast'] = rast
            return_value['v_pos_clip'] = v_pos_clip
            return_value['mask_pyramid'] = mask_pyramid
            return_value['depth'] = depth
        else:
            raise NotImplementedError

        return return_value

environment problem

hello,when I set up the environment, I ran into a problem with the error message:
"ERROR: Could not build wheels for xformers, which is required to install pyproject.toml-based projects.
how can I fix it?

problems about the random seed in the code

hello, thank you for your brilliant work in the fast feed-forward 3d generation model!
i have tried your HuggingFace demo and it works well. I noticed that users can manipulate the random seed in the demo. but when I run your code, it seems no command arguments for the random seed. i wonder how can I set the seed properly. np? torch? torch.cuda?

thank you :)

version error

请问pytorch的版本具体是什么,使用torch 2.0.1报错torch缺少compiler。

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.2.0+cu121 with CUDA 1201 (you have 2.0.1+cu117)
    Python  3.10.11 (you have 3.10.13)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
Traceback (most recent call last):
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\utils\import_utils.py", line 684, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "E:\ProgramData\anaconda3\envs\CRM\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\unet_2d.py", line 24, in <module>
    from .unet_2d_blocks import UNetMidBlock2D, get_down_block, get_up_block
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 23, in <module>
    from .attention import AdaGroupNorm
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\attention.py", line 22, in <module>
    from .attention_processor import Attention
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\attention_processor.py", line 31, in <module>
    import xformers
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\xformers\__init__.py", line 12, in <module>
    from .checkpoint import (  # noqa: E402, F401
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\xformers\checkpoint.py", line 437, in <module>
    class SelectiveCheckpointWrapper(ActivationWrapper):
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\xformers\checkpoint.py", line 449, in SelectiveCheckpointWrapper
    @torch.compiler.disable
AttributeError: module 'torch' has no attribute 'compiler'

checkpoint loading size mismatch

Thanks for your awesome work and contribution!
I tried to run your codes locally after downloading model checkpoints from huggingface, but I encountered a size mismatch error when doing so:

Traceback (most recent call last):
  File "/CRM/local_inference.py", line 152, in <module>    
    pipeline = TwoStagePipeline(  
  File "/CRM/pipelines.py", line 31, in __init__
    self.stage1_model.load_state_dict(torch.load(stage1_model_config.resume, map_location="cpu"), strict=False)
  File "/envs/crm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusionInterface:
        size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).

Interestingly, when I swap the checkpoints for ccm_diffusion and pixel_diffusion, both loading and inference work pretty well. But the results are definitely not correct after swapping the checkpoints.

I have not changed anything to the codes.

关于demo的lego-style

感谢作者。
我想问下,demo里面的lego-style 3d,是怎么做的,可否分享下

谢谢

CUDA out of memory

I am running CRM in RTX 2080 Ti GPU with 10.76 GiB memory.
However, I am getting CUDA running out of memory error.
May I ask any ways to minimize GPU memory usage while running CRM?

Error with Xformers newest version

I'm trying to run this using Xformers 0.0.25 because I have to run latest Torch 2.2.1 which Google Colab just updated to, and xformers 0.0.24 only works with Torch 2.2.0, so installing 0.0.24 takes ~5 minutes and a Restart to downgrade to torch 2.2.0. I got it working in my app at DiffusionDeluxe.com using the recommended 0.0.24 (although keeps running out of RAM), but with 0.0.25 I'm getting this error:

Traceback (most recent call last):
  File "/content/sdd_colab.py", line 46547, in run_crm
    crm_model = CRM(specs).to(torch_device)
  File "/content/CRM/model/crm/model.py", line 46, in __init__
    self.unet2 = UNetPP(in_channels=self.dec.c_dim)
  File "/content/CRM/model/archs/unet.py", line 43, in __init__
    self.unet.enable_xformers_memory_efficient_attention()
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 295, in enable_xformers_memory_efficient_attention
    self.set_use_memory_efficient_attention_xformers(True, attention_op)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 259, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 255, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 255, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 255, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 252, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 253, in set_use_memory_efficient_attention_xformers
    raise ModuleNotFoundError(
ModuleNotFoundError: Refer to https://github.com/facebookresearch/xformers for more information on how to install xformers

I'm hoping you can figure out a solution to fix the breaking change to make it compatible with both, but always nice to be using the newest versions. Thanks, tried to trace the problem down myself, but didn't understand it enough.

Upscaled RBG and CCM ,Tile-Based generation

Hi ,
i wanted to ask if let's say i have taken the 256x6 MV images and generated a Higher resolution MV sheet
is it possible for CRM to generate a better 3d model with more details ?

My tests ideas are :
-Regular Upscale , (CCM won't be that good probably just change resolutions no upscale , still can't figure out if the CCM are used for texturing or generating the 3d mesh ... or both )
or
-run a Tile-Based Algorithm:
first do a regular CRM image generation RGB and CCM 256x6 then upscale them as follows
Algorithm will split the input image into multiple Tiles and generate RGB and CCM for each tile , then blend them all together into one High resolution MV RGB CCM images .

the Tile code is ready and only need some modifications , it showed some great results with Depth map blending
i did some modifications to the code and the models config files and changed the size of the input tensors(array images), the generated RGB and CCM are just garbage using the regular workflow at high resolutions so i can't really tell .
what i need to know :

1-will the Decoder Works with resolutions Higher that 256x6 example 512x3,072 ? or the model is just trained on that and wont work ?
2-i read the paper multiple times , but can't understand CCM , can we skip generating those and just use RGB ? are CCM essential for Mesh generation or used just for texturing ?
3-let's say we have extremely detailed Depth maps , like 4k ultra sharp Maps even skin pores will be present... can we in anyway introduce those depth maps into the workflow of CRM ? (this one is very important)

do let me know ,and many thanks in advance , much love and respect for your work , cheers

RuntimeError: Ninja is required to load C++ extensions

CRM.pth: 100%|███████████████████████████████████████████████████████████████████████| 476M/476M [13:35<00:00, 583kB/s]
Traceback (most recent call last):
File "E:\project\CRM\app.py", line 129, in
model = CRM(specs).to(args.device)
File "E:\project\CRM\model\crm\model.py", line 59, in init
self.renderer = Renderer(tet_grid_size=self.tet_grid_size, camera_angle_num=self.camera_angle_num,
File "E:\project\CRM\util\renderer.py", line 15, in init
self.glctx = dr.RasterizeCudaContext()
File "E:\project\CRM\python\lib\site-packages\nvdiffrast\torch\ops.py", line 177, in init
self.cpp_wrapper = _get_plugin().RasterizeCRStateWrapper(cuda_device_idx)
File "E:\project\CRM\python\lib\site-packages\nvdiffrast\torch\ops.py", line 118, in _get_plugin
torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=False)
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1306, in load
return _jit_compile(
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1710, in _jit_compile
_write_ninja_file_and_build_library(
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1793, in _write_ninja_file_and_build_library
verify_ninja_availability()
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1842, in verify_ninja_availability
raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions
E:\project\CRM>python\python.exe -m pip install Ninja
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: Ninja in e:\project\crm\python\lib\site-packages (1.11.1.1)

Is there any plan to release training script?

Hi, thanks for your wonderful paper and the released inference code!!! I am wondering if there is any plan to release the training script. Your reply will be highly appreciated~

gradio app.py error

Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
--- using zero snr---
/home/harshad/Downloads/CRM/imagedream/ldm/interface.py:117: RuntimeWarning: divide by zero encountered in divide
"sqrt_recip_alphas_cumprod", to_torch(np.sqrt(1.0 / alphas_cumprod))
/home/harshad/Downloads/CRM/imagedream/ldm/interface.py:120: RuntimeWarning: divide by zero encountered in divide
"sqrt_recipm1_alphas_cumprod", to_torch(np.sqrt(1.0 / alphas_cumprod - 1))
Killed

After some time, while running the code, I encountered a 'divide by zero' error. It seems to occur during a division operation, which is likely causing the program to terminate unexpectedly. I suspect there might be an issue with the calculations or data processing logic in the code. The error message indicates that a division by zero was encountered, which is mathematically undefined. Could you please provide guidance on how to resolve this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.