aradhye2002 / ecodepth Goto Github PK

[CVPR'2024] Official implementation of the paper "ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation"

Home Page: https://ecodepth-iitd.github.io/

License: MIT License

Python 21.08% Shell 0.14% Jupyter Notebook 78.78%

cvpr2024 deep-learning depth-estimation metric-depth-estimation monocular-depth-estimation stable-diffusion zero-shot-transfer

ecodepth's Introduction

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

CVPR 2024

Suraj Patni*, Aradhye Agarwal*, Chetan Arora

News

[April 2024] Inference scripts for video or image to depth.
[March 2024] Pretrained checkpoints for NYUv2 and KITTI datasets.
[March 2024] Training and Evaluation code released!
[Feb 2024] ECoDepth accepted in CVPR'2024.

Installation

git clone https://github.com/Aradhye2002/EcoDepth
cd EcoDepth
conda env create -f env.yml
conda activate ecodepth

Dataset Setup

You can take a look at the dataset preparation guide for NYUv2 and KITTI from here. After downloading the datasets, change the data paths in the respective bash files to point to your dataset location where you have downloaded the datasets. Alternatively, you can also make a symbolic link of the dataset folders like so:

cd depth
mkdir data
cd data
ln -s <path_to_kitti_dataset> kitti
ln -s <path_to_nyu_dataset> nyu

Note the dataset structure inside the path you have given in the bash files should look like this:
NYUv2:

nyu
├── nyu_depth_v2
│   ├── official_splits
│   └── sync

KITTI:

kitti
├── KITTI
│   ├── 2011_09_26
│   ├── 2011_09_28
│   ├── 2011_09_29
│   ├── 2011_09_30
│   └── 2011_10_03
└── kitti_gt
    ├── 2011_09_26
    ├── 2011_09_28
    ├── 2011_09_29
    ├── 2011_09_30
    └── 2011_10_03

Pretrained Models

Please download the pretrained weights from this link and save .ckpt weights inside <repo root>/depth/checkpoints directory.

Also download the v1-5 checkpoint of stable-diffusion and put it in the <repo root>/checkpoints directory. Please create an empty directory if you find that such a path does not exist. Note that this checkpoints folder is different from the one above.

Inference

To perform inference on any RGB image or video use the infer_{outdoor,indoor}.sh file. Set the --img_path argument to the image you would to get the depth for and the --video_path to the video from which to produce the depth. In case you only wish to infer on an img or video, simply remove the other argument. Then enter the depth directory by executing cd depth and run:

Infer on outdoor scenes: bash infer_outdoor.sh
Infer on outdoor scenes: bash infer_indoor.sh

Evaluation

To evaluate the model performance on NYUv2 and KITTI datasets, use the test_{kitti, nyu} file. The trained models are publicly available, download the models using above links. Then, navigate to the depth directory and follow the instructions outlined below:

Evaluate on NYUv2 dataset:
bash test_nyu.sh <path_to_saved_model_of_NYU>
Evaluate on KITTI dataset:
bash test_kitti.sh <path_to_saved_model_of_KITTI>

Training

We trained our models on 32 batch size using 8xNVIDIA A100 GPUs. Inside the train_{kitti,nyu}.sh set the NPROC_PER_NODE variable and --batch_size argument to the desired values as per your system resources. For our method we set them as NPROC_PER_NODE=8 and --batch_size=4 (resulting in a total batch size of 32). Afterwards, navigate to the depth directory by executing cd depth and follow the instructions:

Train on NYUv2 dataset:
bash train_nyu.sh
Train on KITTI dataset:
bash train_kitti.sh

Contact

If you have any questions about our code or paper, kindly raise an issue on this repository.

Acknowledgment

We thank Kartik Anand for assistance with the experiments. Our source code is inspired from VPD and PixelFormer. We thank their authors for publicly releasing the code.

BibTeX (Citation)

If you find our work useful in your research, please consider citing using:

@InProceedings{Patni_2024_CVPR,
    author    = {Patni, Suraj and Agarwal, Aradhye and Arora, Chetan},
    title     = {ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {28285-28295}
}

ecodepth's People

Contributors

Stargazers

Watchers

Forkers

eltociear cv-depth thanhpham1987 sharathnarayan12 faisalshahbaz bqm1111 c0ffymachyne asakrg angel-map butterk3ks tpc2233 prashkmr chenhaomingbob zhangzw12319 yacinedeghaies hiyyg

ecodepth's Issues

ImportError preventing training

During the execution of the train_nyu.sh script, an ImportError is encountered when trying to import VectorQuantizer2 from taming.modules.vqvae.quantize. This error prevents the training process from starting as the model initialization fails.

(ecodepth) cv19f24@node13:~/EcoDepth/depth$ bash train_nyu.sh
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
| distributed init (rank 0): env://, gpu 0

 Max_depth = 10.0 meters for nyudepthv2!

<wandb logs>

model will be saved after every 200 steps
val will be done after every 200 steps
This experiment name is :  04151402_nyu_BS-16_lr-one_cycle_training_nyu
log_dir in main log_dir/04151402_nyu_BS-16_lr-one_cycle_training_nyu
Traceback (most recent call last):
  File "/home/cv19f24/EcoDepth/depth/train.py", line 540, in <module>
    main()
  File "/home/cv19f24/EcoDepth/depth/train.py", line 462, in main
    model = EcoDepth(args=args)
            ^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/EcoDepth/depth/models/model.py", line 166, in __init__
    self.encoder = EcoDepthEncoder(out_dim=channels_in, dataset='nyu', args = args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/EcoDepth/depth/models/model.py", line 56, in __init__
    sd_model = instantiate_from_config(self.config.model)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/EcoDepth/depth/ldm/util.py", line 85, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/EcoDepth/depth/ldm/util.py", line 93, in get_obj_from_str
    return getattr(importlib.import_module(module, package=None), cls)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/cv19f24/EcoDepth/depth/ldm/models/diffusion/ddpm.py", line 25, in <module>
    from ldm.models.autoencoder import VQModelInterface, IdentityFirstStage, AutoencoderKL
  File "/home/cv19f24/EcoDepth/depth/ldm/models/autoencoder.py", line 6, in <module>
    from taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer
ImportError: cannot import name 'VectorQuantizer2' from 'taming.modules.vqvae.quantize' (/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/taming/modules/vqvae/quantize.py)
<wandb logs>
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2380373) of binary: /home/cv19f24/.conda-2024.02/envs/ecodepth/bin/python
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/run.py", line 798, in <module>
    main()
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-15_14:02:25
  host      : node13
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2380373)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

I have followed installation down to the letter, so I don't know what to do. Thank you

How can I train on other data sets

Great Work!
If my research involves training on other data sets. (e.g., endoscopes scenario).
How can I train the model.

Thanks.

Training speed

Hello,

Thank you so much for sharing this amazing work!

I try to train the model with the NYU dataset. The paper says about 21 mins per epoch for 8 A100 GPUs.
Say, I am using a single A100 GPU with batch-size 32, and in my case, it seems stuck in the following step for ever... Meanwhile, I can run the evaluation without issue. I don't know what the problem could be and I would appreciate any help/hint!

with torch.no_grad():

        # convert the input image to latent space and scale.

        latents = self.encoder_vq.encode(x).mode().detach() * self.config.model.params.scale_factor

P.S., The evaluation results match with the paper well except for sq_rel.

    d1         d2         d3    abs_rel     sq_rel       rmse   rmse_log      log10      silog 
0.9776     0.9973     0.9995     0.0599     0.0194     0.2187     0.0773     0.0259     5.7549

Again, thanks for the great work!

cannot load model

Hi, I receive an error if I load the code manually this way:

args = argparse.Namespace()

# Manually set the arguments
args.min_depth = 1e-3
args.max_depth = 128
args.flip_test = True
args.ckpt_dir = "./checkpoints/kitti.ckpt"
args.vit_model = "google/vit-base-patch16-224"
args.max_depth_eval = 128
args.no_of_classes = 200
args.deconv_kernels = [2, 2, 2]
args.num_filters = [32, 32, 32]
args.num_deconv = 3

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = EcoDepth(args=args)
model_weight = torch.load(args.ckpt_dir)['model']
model.load_state_dict(model_weight)
model.to(DEVICE)
model.eval()

when executing this code I receive

LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels

Loading openai/clip-vit-large-patch14 to CLIPTextModel.....

Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ...

- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Loaded openai/clip-vit-large-patch14 to CLIPTextModel.....

ddpm.py: Restored from ../checkpoints/v1-5-pruned-emaonly.ckpt with 0 missing and 2 unexpected keys
Unexpected Keys: ['model_ema.decay', 'model_ema.num_updates']

RuntimeError                              Traceback (most recent call last)
Cell In[27], [line 19](vscode-notebook-cell:?execution_count=27&line=19)
     [17](vscode-notebook-cell:?execution_count=27&line=17) model = EcoDepth(args=args)
     [18](vscode-notebook-cell:?execution_count=27&line=18) model_weight = torch.load(args.ckpt_dir)['model']
---> [19](vscode-notebook-cell:?execution_count=27&line=19) model.load_state_dict(model_weight)
     [20](vscode-notebook-cell:?execution_count=27&line=20) model.to(DEVICE)
     [21](vscode-notebook-cell:?execution_count=27&line=21) model.eval()

File /anaconda/envs/ecodepth/lib/python3.11/site-packages/torch/nn/modules/module.py:2041, in Module.load_state_dict(self, state_dict, strict)

RuntimeError: Error(s) in loading state_dict for EcoDepth:
	size mismatch for encoder.cide_module.embeddings: copying a param with shape torch.Size([100, 768]) from checkpoint, the shape in current model is torch.Size([200, 768]).
	size mismatch for encoder.cide_module.fc.2.weight: copying a param with shape torch.Size([100, 400]) from checkpoint, the shape in current model is torch.Size([200, 400]).
	size mismatch for encoder.cide_module.fc.2.bias: copying a param with shape torch.Size([100]) from checkpoint, the shape in current model is torch.Size([200]).

Zero-shot transfer

Excellent work and impressive results.

I have a question about the quantitative results for zero-shot transfer. To get the results on unseen datasets, did you use the trained NYUv2 model with best_rmse.ckpt and train it again for one more epoch using those unseen datasets?

Thank you

Whether multiple gpus can be used for parallel inference?

Predicted depth map only

Is there a way to export a predicted depth map only instead of both image and its depth map side by side? Or/and even to save the grayscale depth map without color palette to the depth map.

Any argument to add to the lines?

Absolute depth (metric depth)

thanks for the awesome contribution on monocular depth estimation. Does the model predict the relative depth map? is it possible to get actual distances (in meters) from the predicted depth map? thank you for your response

train.py: error: unrecognized arguments:

bash train_kitti.sh
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
/home/yx/miniconda3/envs/ecodepth/lib/python3.11/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
usage: train.py [-h] [--exp_name EXP_NAME] [--gpu_or_cpu GPU_OR_CPU] [--data_path DATA_PATH]
[--dataset {nyudepthv2,kitti,imagepath,vkitti,sunrgbd,ibims,diode_indoors_val,hypersim,vkitti2,diode_outdoor,DDAD,DIML}] [--batch_size BATCH_SIZE] [--workers WORKERS]
[--max_depth MAX_DEPTH] [--min_depth MIN_DEPTH] [--max_depth_eval MAX_DEPTH_EVAL] [--min_depth_eval MIN_DEPTH_EVAL] [--do_kb_crop DO_KB_CROP]
[--kitti_crop {garg_crop,eigen_crop}] [--pretrained PRETRAINED] [--drop_path_rate DROP_PATH_RATE] [--use_checkpoint USE_CHECKPOINT] [--num_deconv NUM_DECONV]
[--num_filters NUM_FILTERS [NUM_FILTERS ...]] [--deconv_kernels DECONV_KERNELS [DECONV_KERNELS ...]] [--flip_test] [--no_of_classes NO_OF_CLASSES] [--vit_model VIT_MODEL]
[--num_of_diffusion_step NUM_OF_DIFFUSION_STEP] [--eigen_crop_in_dataloader_itself_for_nyu EIGEN_CROP_IN_DATALOADER_ITSELF_FOR_NYU] [--use_right USE_RIGHT] [--cutflip CUTFLIP]
[--variance_focus VARIANCE_FOCUS] [--epochs EPOCHS] [--max_lr MAX_LR] [--min_lr MIN_LR] [--weight_decay WEIGHT_DECAY] [--layer_decay LAYER_DECAY] [--crop_h CROP_H]
[--crop_w CROP_W] [--log_dir LOG_DIR] [--val_freq VAL_FREQ] [--pro_bar PRO_BAR] [--model_save_freq MODEL_SAVE_FREQ] [--validate_on_kitti_also VALIDATE_ON_KITTI_ALSO]
[--print_freq PRINT_FREQ] [--save_last_model] [--resume_from RESUME_FROM] [--save_depths_gray] [--save_depths_color] [--learning_rate_schedule LEARNING_RATE_SCHEDULE]
[--gradient_accumulation GRADIENT_ACCUMULATION] [--log_in_wandb LOG_IN_WANDB] [--finetune_on_another_dataset FINETUNE_ON_ANOTHER_DATASET]
[--pretrained_ckpt_path PRETRAINED_CKPT_PATH]
train.py: error: unrecognized arguments: --reg_loss_abs_of_embed False --vit_trainable False --use_cross_attn_of_unet False --kitti_split_to_half --trainable_bins False --bins_from_coarsest_res False --per_pixel_bin_prediction False --use_text_adapter True --pixelshuffle_decoder True

colab demo request

please create a demo notebook on colab for demo inference

ecodepth module not found

When running test.py or infer.py or any py I get this error:

Traceback (most recent call last):
File "D:\AITools\EcoDepth\depth\test.py", line 9, in
from models.model import EcoDepth
File "D:\AITools\EcoDepth\depth\models\model.py", line 11, in
from ecodepth.models import UNetWrapper, EmbeddingAdapter
ModuleNotFoundError: No module named 'ecodepth'

Any way to fix this?

Windows 10, rtx 3090, python 3.10.11

The test-simple python script

Hi, thank you for your amazing work!
I really want to check the result of the special condition picture on your network, but there is no available code to test, could you please support one? Like the monodepth2

Comparison with Depth Anything

How does this method perform compared with Depth Anything? I dont see the comparison in the paper.