Comments (7)
Hi, not sure what you want exactly. If you want the trajectory from the pose vector, you can see how it's done in test_pose : https://github.com/ClementPinard/SfmLearner-Pytorch/blob/master/test_pose.py#L78
Basically, everything is given with respect to the middle frame, so you need to put back the reference to the first frame.
Once it's done, if you want the trajectory for a longer sequence than just 5 frames, you will need to compose the 4x4 matrices so that the first ever frame is the reference (the identity matrix) and all the other matrices are given with respect to the first one.
you translation vectors will be the first 3 rows of the last column.
from sfmlearner-pytorch.
Hello,
Apologies for not stating my question clearly.
My goal is to include in the loss for training, the difference between the predicted translation vector and ground truth translation vector to see if we can we deal with depth ambiguity with that approach.
I have modified the class SequenceFolder in sequence_folders.py to return the ground truth pose as well for each sample.
In the train function in train.py I have added this code to calculate the loss for the translation vector, with respect to the ground truth
b = tgt_img.shape[0]
reordered_output_poses = torch.cat([pose[:, :poses.shape[1]//2],
torch.zeros(b, 1, 6).to(pose),
pose[:, poses.shape[1]//2:]], dim=1)
# pose_vec2mat only takes B, 6 tensors, so we simulate a batch dimension of B * seq_length
unravelled_poses = reordered_output_poses.reshape(-1, 6)
unravelled_matrices = pose_vec2mat(unravelled_poses, rotation_mode=args.rotation_mode)
inv_transform_matrices = unravelled_matrices.reshape(b, -1, 3, 4)
rot_matrices = inv_transform_matrices[..., :3].transpose(-2, -1)
tr_vectors = -rot_matrices @ inv_transform_matrices[..., -1:]
loss_4 = torch.sum(gt_transf_matrix[:, :, :, 3] - tr_vectors[:, :, :, 0])
loss = w1*loss_1 + w2*loss_2 + w3*loss_3 + w4 * loss_4
Unfortunately, I am not sure if the predicted translation vector is calculated with respect to the same frame that its the ground truth translation vector.
from sfmlearner-pytorch.
Also the code from sequence_folders.py
def __init__(self, root, seed=None, train=True, sequence_length=3, transform=None, target_transform=None):
np.random.seed(seed)
random.seed(seed)
self.root = Path(root)
scene_list_path = self.root/'train.txt' if train else self.root/'val.txt'
self.scenes = [self.root/folder[:-1] for folder in open(scene_list_path)]
self.transform = transform
self.crawl_folders(sequence_length)
def crawl_folders(self, sequence_length):
sequence_set = []
demi_length = (sequence_length-1)//2
shifts = list(range(-demi_length, demi_length + 1))
shifts.pop(demi_length)
for scene in self.scenes:
try:
poses = np.genfromtxt(scene/'poses.txt').reshape((-1, 3, 4))
poses_4D = np.zeros((poses.shape[0], 4, 4)).astype(np.float32)
poses_4D[:, :3] = poses
poses_4D[:, 3, 3] = 1
except:
print("poses.txt was not found in ", scene, "\n skip this sequence")
self.scenes.remove(scene)
continue
intrinsics = np.genfromtxt(scene/'cam.txt').astype(np.float32).reshape((3, 3))
imgs = sorted(scene.files('*.jpg'))
assert(len(imgs) == poses.shape[0])
intrinsics = np.genfromtxt(scene/'cam.txt').astype(np.float32).reshape((3, 3))
imgs = sorted(scene.files('*.jpg'))
if len(imgs) < sequence_length:
continue
for i in range(demi_length, len(imgs)-demi_length):
sample = {'intrinsics': intrinsics, 'tgt': imgs[i], 'ref_imgs': [], 'poses': []}
first_pose = poses_4D[i - demi_length]
sample['poses'] = (np.linalg.inv(first_pose) @ poses_4D[i - demi_length: i + demi_length + 1])[:, :3]
for j in shifts:
sample['ref_imgs'].append(imgs[i+j])
sample['poses'] = np.stack(sample['poses'])
sequence_set.append(sample)
random.shuffle(sequence_set)
self.samples = sequence_set
def __getitem__(self, index):
sample = self.samples[index]
tgt_img = load_as_float(sample['tgt'])
poses = sample['poses']
ref_imgs = [load_as_float(ref_img) for ref_img in sample['ref_imgs']]
if self.transform is not None:
imgs, intrinsics = self.transform([tgt_img] + ref_imgs, np.copy(sample['intrinsics']))
tgt_img = imgs[0]
ref_imgs = imgs[1:]
else:
intrinsics = np.copy(sample['intrinsics'])
return tgt_img, ref_imgs, intrinsics, np.linalg.inv(intrinsics), poses
def __len__(self):
return len(self.samples)
from sfmlearner-pytorch.
Ok, thanks for clarifying
By looking at the code I see that you are computing groundtruth poses with respect to the first frame and then computing predicted poses with respect to the target frame.
So think your problem is here, you might want to multiply your inverse matrices by the inverse of the first one so that the first matrix is identity and the others are actual poses
On a more general note, you might want to do the opposite of what you are doing. Instead of computing poses relative to the first of the sequence with a 4D matrix, maybe you can compute the equivalent 6D vectors, and have it with respect to the target frame (usually in the middle) instead of the first one, so that it already match the order outputted by the pose network.
I actually did some of this work with my own DepthNet network where I tested pose supervision
https://github.com/ClementPinard/unsupervised-depthnet/blob/master/train_img_pairs.py#L355
If you want to solve the scale problem on Kitti, you might want to have a look at packent-SFM from toyota where they supervise the velocity loss (and thus the depth scale as well) https://github.com/TRI-ML/packnet-sfm/blob/master/packnet_sfm/losses/velocity_loss.py
from sfmlearner-pytorch.
Thank you very much Clement for you usefull feedback and for the guidance to the paper from Toyota, I was not aware of it!
I assume that when you say the first frame, you are reffering to the first frame of the sequence (by default 3) and not the first frame of the scene if I understand correctly the code?
Checking out your code, the compensate_pose transforms a transformation matrix with respect to another transformation matrix. Therefore, can I use it in the train.py as below (which is the modified version according to your comments of the code that I have already posted) ?
reordered_output_poses = torch.cat([pose[:, :poses.shape[1]//2],
torch.zeros(b, 1, 6).to(pose),
pose[:, poses.shape[1]//2:]], dim=1)
# pose_vec2mat only takes B, 6 tensors, so we simulate a batch dimension of B * seq_length
unravelled_poses = reordered_output_poses.reshape(-1, 6)
unravelled_matrices = pose_vec2mat(unravelled_poses, rotation_mode=args.rotation_mode)
inv_transform_matrices = unravelled_matrices.reshape(b, -1, 3, 4)
rot_matrices = inv_transform_matrices[..., :3].transpose(-2, -1)
tr_vectors = -rot_matrices @ inv_transform_matrices[..., -1:]
new_gt_transf_matrix = compensate_pose(inv(gt_transf_matrix), inv(tgt_img)) # Here is the only modification
loss_4 = torch.sum(new_gt_transf_matrix[:, :, :, 3] - tr_vectors[:, :, :, 0])
loss = w1*loss_1 + w2*loss_2 + w3*loss_3 + w4 * loss_4
I am really sorry for the many and basic questions. I am very new to the field
from sfmlearner-pytorch.
Yes, I think that could work that way.
Now the realm of transformation matrix is a dark place where you spend time and time trying to figure out what order you should multiply the matrices and if you need to inverse or not, so I'd advise you to design some basic tests to make sure that it's working properly.
What I did in my case was to reduce the dataset to only one sequence. The model will overfit like crazy but it will show whether the pose supervision loss and the photometric loss are consistent. If you can't get both to be low at the same time, it means there's probably a mistake somewhere.
good luck !
from sfmlearner-pytorch.
Hello Clement,
Aplogies for reopening the issue after closing it at first place.
Initially, I tried the way that I mentioned, but I figure out that its way more complicated and I tested with training it on one sequence but I didnt see the desired results.
So I tried to implement the approach that you mentioned, about multiplying the inverse matrices with the first inverse matrix of the sequence.
Unfortunatelly, when I trained it only on one sequence, the photometrics loss decreased but not the ego motion error which remained roundly the same among all the epochs (200 in total).
Here is the code that I implemented inside the train function in the script train.py.
for i, (tgt_img, ref_imgs, intrinsics, intrinsics_inv, gt_poses) in enumerate(train_loader):
log_losses = i > 0 and n_iter % args.print_freq == 0
log_output = args.training_output_freq > 0 and n_iter % args.training_output_freq == 0
# measure data loading time
data_time.update(time.time() - end)
tgt_img = tgt_img.to(device)
ref_imgs = [img.to(device) for img in ref_imgs]
intrinsics = intrinsics.to(device)
# compute output
disparities = disp_net(tgt_img)
depth = [1/disp for disp in disparities]
explainability_mask, pose = pose_exp_net(tgt_img, ref_imgs)
#========================= Code added for using ego motion as part of the loss ==================
loss_4 = torch.tensor(0).to(device)
if args.tr_tv:
b = tgt_img.shape[0]
reordered_output_poses = torch.cat([pose[:, : gt_poses.shape[1]//2],
torch.zeros(b, 1, 6).to(pose),
pose[:, gt_poses.shape[1]//2:]], dim=1)
# pose_vec2mat only takes B, 6 tensors, so we simulate a batch dimension of B * seq_length
unravelled_poses = reordered_output_poses.reshape(-1, 6)
unravelled_matrices = pose_vec2mat(unravelled_poses, rotation_mode=args.rotation_mode)
inv_transform_matrices = unravelled_matrices.reshape(b, -1, 3, 4)
# 2nd Approach
for j in range(inv_transform_matrices.shape[0]):
for k in range(inv_transform_matrices.shape[1]):
inv_transform_matrices[j, k, :, :] = inv_transform_matrices[j, k, :, :] * inv_transform_matrices[j, 0 , :, :]
# End of 2nd approach
rot_matrices = inv_transform_matrices[..., :3].transpose(-2, -1)
# Here are the predictied translation vectors
tr_vectors = -rot_matrices @ inv_transform_matrices[..., -1:] # Predicted vectors
loss_4 = torch.sum(torch.abs( gt_poses[:, :, :, -1].to(device) - tr_vectors[:, :, :, 0].to(device)))
loss_4 = loss_4.to(device)
loss_4 = torch.tensor(loss_4, dtype=torch.float64)
#========================= Code added for using ego motion as part of the loss ==================
loss_1, warped, diff = photometric_reconstruction_loss(tgt_img, ref_imgs, intrinsics,
depth, explainability_mask, pose,
args.rotation_mode, args.padding_mode)
if w2 > 0:
loss_2 = explainability_loss(explainability_mask)
else:
loss_2 = 0
loss_3 = smooth_loss(depth)
loss = w1*loss_1 + w2*loss_2 + w3*loss_3 + 0.6 * loss_4
I am new to the field so I cant be sure for my implementation. I would really appreciate if you could help me to figure out whats the problem
from sfmlearner-pytorch.
Related Issues (20)
- Large Errors on Pose Prediction Network HOT 3
- why the gpu memory cost of tensorflow version is larger than pytorch version HOT 2
- Weird results from pretrained model on KITTI images HOT 4
- Question about using oxts data HOT 1
- Cannot run `train.py` with nohup HOT 2
- imread during inference load the image as uint8 HOT 4
- How About the Flops, fps and parameter of this model? HOT 1
- regarding the predicted depth map during inverse warp HOT 2
- How to visualize the warped image (ref_img_wapred) HOT 2
- regarding inverse_warping HOT 10
- Is the image input of depth network fixed? HOT 2
- Question about diff
- How to load training dataset
- Regarding the depth used for generating target image HOT 5
- Question about the poses predicted by the posenet HOT 2
- about the pose scale HOT 2
- samples in test_files_eigen dont exist in the KITTI
- Seq 10 is not in the test_secens.txt HOT 2
- raw_data_downloading error HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sfmlearner-pytorch.