syscv / idisc Goto Github PK
View Code? Open in Web Editor NEWiDisc: Internal Discretization for Monocular Depth Estimation [CVPR 2023]
Home Page: https://arxiv.org/abs/2304.06334
License: Other
iDisc: Internal Discretization for Monocular Depth Estimation [CVPR 2023]
Home Page: https://arxiv.org/abs/2304.06334
License: Other
When I see your code, I don't understand steps like: divide for depth scale, choose pixels greater than min depth, ...
Hi @lpiccinelli-eth , the paper mentions the GT-Based Depth Rescaling for Diode Indoor dataset.
Thank you very much for your work. I have already configured the environment, but I don’t know how to use your code to find the depth of a picture. How can I get the depth map?
Hello, thank you for writing an excellent paper.
Is it correct that zero-shot testing for the mentioned Diode was conducted only on the 325 images within diode_indoor_val.txt? If so, is it also correct that diode_indoor_train.txt was not used in the paper?
Hi thank you for your wonderful work and the zoo. I was performing benchmark evals on SOTA MDE models for my dissertation.
I tried to replicate your work in Google Colab (instead of conda) and when I run !python ./scripts/test.py --model-file ./idisc_nyu_resnet101.pt --config-file ./configs/nyu/nyu_r101.json --base-path ../temp/datasets
I get an error.
Traceback (most recent call last):
File "/content/idisc/./scripts/test.py", line 14, in <module>
import idisc.dataloders as custom_dataset
ModuleNotFoundError: No module named 'idisc'
I dont know why this is happening. My suspicion is !bash ./make.sh
is this because it threw low of exceptions but said "Finished processing dependencies for MultiScaleDeformableAttention==1.0
".
Your help is greatly appreciated.
Hi, First of all, congratulations on this great work!
I'm evaluating recent Depth Estimation techniques and I'm wondering if you could help me to validate the results.
I downloaded your SwinLarge predictions and wanted to compare them with the KITTI Improved Ground Truth [1] directly by comparing your output map with the GT.
I followed your instructions by dividing by 256 (as the GT data), and I interpolated just like your code do on the output of the model, using F.interpolate with mode=bicubic and align_corners=True.
I'm following Monodepth2 procedures to compare, therefore not using Garg's crop in here.
The results are the following:
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
& 0.086 & 0.539 & 4.228 & 0.153 & 0.913 & 0.979 & 0.991 \
I was expecting really lower results. Can you validate these steps, please? Are the SwinLarge predictions giving the correct outcome?
The code is quite simple, and I'll share it above here just so you can check it (if you want).
`
def compute_errors(gt, pred):
thresh = np.maximum((gt / pred), (pred / gt))
a1 = (thresh < 1.25 ).mean()
a2 = (thresh < 1.25 ** 2).mean()
a3 = (thresh < 1.25 ** 3).mean()
rmse = (gt - pred) ** 2
rmse = np.sqrt(rmse.mean())
rmse_log = (np.log(gt) - np.log(pred)) ** 2
rmse_log = np.sqrt(rmse_log.mean())
abs_rel = np.mean(np.abs(gt - pred) / gt)
sq_rel = np.mean(((gt - pred) ** 2) / gt)
return abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3
MIN_DEPTH = 1e-3
MAX_DEPTH = 80
pred = cv2.imread(pred_path, -1)
pred = pred / 256
gt = cv2.imread(gt_path, -1)
gt_depth = gt / 256
gt_height, gt_width = gt_depth.shape[:2]
mask = np.logical_and(gt_depth > MIN_DEPTH, gt_depth < MAX_DEPTH)
pred_depth = F.interpolate(
torch.from_numpy(pred).unsqueeze(0).unsqueeze(0),
gt.shape,
mode="bicubic",
align_corners=True,
)
pred_depth[pred_depth < MIN_DEPTH] = MIN_DEPTH
pred_depth[pred_depth > MAX_DEPTH] = MAX_DEPTH
compute_errors(gt_depth, pred_depth)
`
I was expecting lower values than what you provided in the paper (like... Abs Rel probably lower than 0.05) but actually got way higher values (like... Abs Rel 0.086).
Thanks again for your work!
Ref.
[Uhrig, Jonas, et al. "Sparsity invariant cnns." 2017 international conference on 3D Vision (3DV). IEEE, 2017.]
Hi, thanks for your great work!
I doubt how is (d) Internal discretization in Figure 1 in the paper generated.
I infer that the id is the max value of (QiKi) from "(QiKi) is the spatial location for which each specific IDR is responsible ", as described at the end of the fourth page of the paper.
Could you provide me with the concrete computation process?
Hello,
I was going through the predicted depth maps provided as part of the repo.
It looks like the resolution of those depth maps (for NYUv2) is 160x120 whereas the input is 640x480. Does that mean the network is trained to output depth only at 160x120? I did see in the Readme the disclaimer about resizing it to the actual resolution, wouldn't just resizing cause issues at the depth boundaries?
Thanks!
Hi,
I am performing zero-shot testing for surface normal estimation on KITTI dataset using your iDisc module. In order to understand the results better, I would like to understand how iDisc computes surface normals and what ground truth data does it use for surface normal estimation. Since I could not find much information through the paper, can you please tell me where to look for this information or perhaps give an answer here.
Thanks in advance
Hello, thanks for the great work!
I am running your model on my custom dataset. However it seems that the saved depth from NYUv2 model is wrong. I think this might due to my misuse of your model's output. I have a script like this:
import os
import shutil
import torch
import numpy as np
import cv2
from tqdm import tqdm
from pathlib import Path
import sys, json
from PIL import Image
import torchvision.transforms.functional as TF
# I clone your repo and put to the place where I can directly import
sys.path.insert(0, str(Path(__file__).parent.resolve() / "idisc"))
from idisc.models.idisc import IDisc
from idisc.utils import (DICT_METRICS_DEPTH, DICT_METRICS_NORMALS,
RunningMetric, validate)
model = IDisc.build(json.load(open('idisc/configs/nyu/nyu_swinl.json')))
model.load_pretrained("idisc/nyu_swinlarge.pt")
model = model.to("cuda")
model.eval()
# read in image
image = np.asarray(Image.open(image_path))
image = TF.normalize(TF.to_tensor(image), **{"mean": [0.5, 0.5, 0.5], "std": [0.5, 0.5, 0.5]})
image = image.unsqueeze(0).to("cuda")
with torch.inference_mode():
depth, *_ = model(image)
TF.to_pil_image(depth[0].cpu()).save(save_path)
I am using Swin-Large model. The image_path
is the path to this image
of size 224x224 (I uploaded the exact image in case you might need to debug this), DPT can generate depth like this
however the output of idisc swin-large is this
I believe I made some mistakes somewhere. I wonder if you can help me debug this.
Thanks!
Hi,
Firstly, thanks a lot for your wonderful work. I'm facing a problem when training the model with 4 GPUs. DataLoader worked well and I got a error like this:
Any suggestions will be appreciated!
Loaded 22441 images. Totally 717 invalid pairs are filtered
Loaded 491 images. Totally 206 invalid pairs are filtered
-> Local random sampler
Start training:
Loaded 22441 images. Totally 717 invalid pairs are filtered
Loaded 491 images. Totally 206 invalid pairs are filtered
-> Local random sampler
Start training:
Loaded 22441 images. Totally 717 invalid pairs are filtered
Loaded 491 images. Totally 206 invalid pairs are filtered
-> Local random sampler
Start training:
Loaded 22441 images. Totally 717 invalid pairs are filtered
Loaded 491 images. Totally 206 invalid pairs are filtered
-> Local random sampler
Start training:
Traceback (most recent call last):
File "/project/6064028/tmp/code/idisc/scripts/train_DDP.py", line 288, in
main_worker(config, args)
File "/project/6064028/tmp/code/idisc/scripts/train_DDP.py", line 168, in main_worker
with context as fp, model.no_sync() as no_sync:
File "/project/6064028/tmp/idisc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'IDisc' object has no attribute 'no_sync'
Traceback (most recent call last):
File "/project/6064028/tmp/code/idisc/scripts/train_DDP.py", line 288, in
main_worker(config, args)
File "/project/6064028/tmp/code/idisc/scripts/train_DDP.py", line 168, in main_worker
with context as fp, model.no_sync() as no_sync:
File "/project/6064028/tmp/idisc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'IDisc' object has no attribute 'no_sync'
My instructions are like this
python ./scripts/test.py
--model-file
../model/nyunormals_swinlarge.pt
--config-file
../configs/nyunorm/nyunorm_swinl.json
--base-path
$WorkSpace/idisc-main/tmp/
The result I got was:
Test/AngularLoss: -0.14168441648872662
Error in best. a1 (0.2765)
Error in best. a2 (0.379)
Error in best. a3 (0.4847)
Error in best. a4 (0.649)
Error in best. a5 (0.7218)
Error in best. rmse_angular (33.0139)
Error in best. mean (22.66)
Error in best. median (12.8913)
Hi,
Thank you for sharing the code and weights. I am trying to load the surface normal estimator using the config and weights linked in https://github.com/SysCV/idisc?tab=readme-ov-file#normals:
import json
from idisc.models.idisc import IDisc
NORMALS_CONFIG_FILE = "models/nyunorm_swinl.json"
NORMALS_MODEL = "models/nyunormals_swinlarge.pt"
with open(NORMALS_CONFIG_FILE, "r") as f:
config = json.load(f)
model = IDisc.build(config=config)
model.load_pretrained(NORMALS_MODEL)
-> Encoder is pretrained from: https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window7_224_22k.pth Loading pretrained info: _IncompatibleKeys(missing_keys=['norm0.weight', 'norm0.bias', 'norm1.weight', 'norm1.bias', 'norm2.weight', 'norm2.bias', 'norm3.weight', 'norm3.bias'], unexpected_keys=['norm.weight', 'norm.bias', 'head.weight', 'head.bias', 'layers.0.blocks.1.attn_mask', 'layers.1.blocks.1.attn_mask', 'layers.2.blocks.1.attn_mask', 'layers.2.blocks.3.attn_mask', 'layers.2.blocks.5.attn_mask', 'layers.2.blocks.7.attn_mask', 'layers.2.blocks.9.attn_mask', 'layers.2.blocks.11.attn_mask', 'layers.2.blocks.13.attn_mask', 'layers.2.blocks.15.attn_mask', 'layers.2.blocks.17.attn_mask'])
It also happens with Swin-Large
for NYU2 Depth Estimation.
Am I missing something?
Thank you.
hi, i had read your paper.Thanks for your job.Can you public your code? I want to test the result using pics . Thanks.!!!
Hello, I have a few questions regarding the outdoor datasets DDAD and Argoverse that you suggested for testing my methodology for outdoor zero-shot generalization.
In the paper, it mentions cropping the image size to 1920 x 870, but I would like to change it to 864 to avoid size mismatch. It seems fine to me, but is there any potential issue from the author's perspective?
After fixing the size to 1920 x 870 through cropping, do we need to apply eigen_crop or garg_crop separately? If so, should eigen_crop be used as mentioned in argo_swin.json?
Thank you for your excellent work.
Thanks for your great work! I have some naive questions about the geometrical data augmentation. First, to my best knowledge, the geometrical data augmentation especially the "random_scale" is not presented in previous works. However, I didn't see ablation experiments on data augmentation in your paper. Could you tell me whether you followed previous works to perform data augmentation or provide some ablation experiments about this? Second, I notice that camera intrinsics are concatenated with images and changed with image scale. But I'm not sure how do you use camera intrinsics during train or test.
Hi,
I was trying to run a visual comparison between predictions that you saved for different networks (ResNet101, EfficientNet-B5, Swin-T, Swin-B, and Swin-L) for NYU-v2 dataset. I loaded the saved .png files as 16-bit unsigned integers, converted them to float32, divided by 1000 as suggest but afterwards the max for different networks is different. For example for image 'bathroom/sync_depth_00045.png', the max for different networks after scaling with 1000 is noted below:
ResNet101: 2.224
EB-5: 3.009
Swin-T: 2.991
Swin-B: 2.654
Swin-L: 2.974
Can you please advise if I should clip the values between [0, 1] or use a different scalar, scaling by the max of each output would make the outputs look visually different.
Thanks!
Hello,
Based on the original function, I added the function of outputting a color depth map to the project code.
However, in the kitti data set, using the same picture (from the first row of Figure 13 in the original paper), whether using resnet101, effnetb5 or swint to swinl, I cannot get as good an effect as shown in Figure 13 of the paper.
The following three pictures are, in order, the test effect I used resnet101, the test effect I used swinl and the original paper effect.
Apart from that, the main part of my test code is as follows
img = Image.open(
".\\2011_09_26\\2011_09_26_drive_0002_sync\\image_02\\data\\0000000021.png")
transform = trasforms.Compose([trasforms.ToTensor(), trasforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
img = transform(img).unsqueeze(0)
model.eval()
with torch.no_grad():
preds, losses, _ = model(img.to(device), None, None)
preds = preds.cpu().numpy()[0, 0, :, :]
img = visulization.colorize(preds)
img = Image.fromarray(img, mode='RGB')
min_val = np.min(preds)
max_val = np.max(preds)
scaled_array = (preds - min_val) / (max_val - min_val)
preds = scaled_array * 255
preds = Image.fromarray(preds.astype("uint8"), mode='L')
preds.save("gray.png")
img.save("RGB.png")
print("Done")
Dear Luigi Piccinelli,
I hope this message finds you well. I wanted to express my sincere appreciation for your exceptional article. Inspired by your work, I attempted to train your project on the KITTI Eigen partitioning dataset.
However, during my training process, I encountered several abnormal phenomena that I would like to bring to your attention:
Here is a screenshot depicting the issue:
To accommodate the equipment I am using (a single machine with four RTX 3090s and no SLURM), I modified the distributed training setup from SLURM to standard DDP (DistributedDataParallel).
Additionally, I made some modifications in the dataloader directory to align with the directory structure of my existing KITTI dataset. I believe these changes should not be the cause of the undesirable results, as the code correctly outputs messages such as "Loaded 23158 images. Totally 0 invalid pairs are filtered" and "Loaded 652 images. Totally 45 invalid pairs are filtered."
Furthermore, in order to track the training process using TensorBoard, I incorporated some code in the training section to generate and save log information.
Apart from these adjustments, I have not made any additional modifications to the code. Specifically, the config file remains the same as the one you provided.
I would greatly appreciate your valuable insights and guidance regarding these issues. If there are any specific details or additional information I can provide to assist in troubleshooting, please let me know. Thank you once again for your remarkable contribution to the field.Best regards
Hi, thanks for your wonderful work! I have downloaded nyu.zip from google drive following your instructions, but I cannot find the annotation for normals. The annotation in train.txt and test.txt points to '518.8579' but I cannot find the file in the .zip. Could you please let me know how to find these normal annotations? Many thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.