clementpinard / flownetpytorch Goto Github PK

View Code? Open in Web Editor NEW

836.0 20.0 206.0 1.22 MB

Pytorch implementation of FlowNet by Dosovitskiy et al.

License: MIT License

Python 100.00%

pytorch flownet flow

flownetpytorch's Introduction

FlowNetPytorch

Pytorch implementation of FlowNet by Dosovitskiy et al.

This repository is a torch implementation of FlowNet, by Alexey Dosovitskiy et al. in PyTorch. See Torch implementation here

This code is mainly inspired from official imagenet example. It has not been tested for multiple GPU, but it should work just as in original code.

The code provides a training example, using the flying chair dataset , with data augmentation. An implementation for Scene Flow Datasets may be added in the future.

Two neural network models are currently provided, along with their batch norm variation (experimental) :

FlowNetS
FlowNetSBN
FlowNetC
FlowNetCBN

Pretrained Models

Thanks to Kaixhin you can download a pretrained version of FlowNetS (from caffe, not from pytorch) here. This folder also contains trained networks from scratch.

Note on networks loading

Directly feed the downloaded Network to the script, you don't need to uncompress it even if your desktop environment tells you so.

Note on networks from caffe

These networks expect a BGR input (compared to RGB in pytorch). However, BGR order is not very important.

Prerequisite

these modules can be installed with pip

pytorch >= 1.2
tensorboard-pytorch
tensorboardX >= 1.4
spatial-correlation-sampler>=0.2.1
imageio
argparse
path.py

pip install -r requirements.txt

Training on Flying Chair Dataset

First, you need to download the the flying chair dataset . It is ~64GB big and we recommend you put it in a SSD Drive.

Default HyperParameters provided in main.py are the same as in the caffe training scripts.

Example usage for FlowNetS :

python main.py /path/to/flying_chairs/ -b8 -j8 -a flownets

We recommend you set j (number of data threads) to high if you use DataAugmentation as to avoid data loading to slow the training.

For further help you can type

python main.py -h

Visualizing training

Tensorboard-pytorch is used for logging. To visualize result, simply type

tensorboard --logdir=/path/to/checkpoints

Training results

Models can be downloaded here in the pytorch folder.

Models were trained with default options unless specified. Color warping was not used.

Arch	learning rate	batch size	epoch size	filename	validation EPE
FlowNetS	1e-4	8	2700	flownets_EPE1.951.pth.tar	1.951
FlowNetS BN	1e-3	32	695	flownets_bn_EPE2.459.pth.tar	2.459
FlowNetC	1e-4	8	2700	flownetc_EPE1.766.pth.tar	1.766

Note : FlowNetS BN took longer to train and got worse results. It is strongly advised not to you use it for Flying Chairs dataset.

Validation samples

Prediction are made by FlowNetS.

Exact code for Optical Flow -> Color map can be found here

Input	prediction	GroundTruth

Running inference on a set of image pairs

If you need to run the network on your images, you can download a pretrained network here and launch the inference script on your folder of image pairs.

Your folder needs to have all the images pairs in the same location, with the name pattern

{image_name}1.{ext}
{image_name}2.{ext}

python3 run_inference.py /path/to/images/folder /path/to/pretrained

As for the main.py script, a help menu is available for additional options.

Note on transform functions

In order to have coherent transformations between inputs and target, we must define new transformations that take both input and target, as a new random variable is defined each time a random transformation is called.

Flow Transformations

To allow data augmentation, we have considered rotation and translations for inputs and their result on target flow Map. Here is a set of things to take care of in order to achieve a proper data augmentation

The Flow Map is directly linked to img1

If you apply a transformation on img1, you have to apply the very same to Flow Map, to get coherent origin points for flow.

Translation between img1 and img2

Given a translation (tx,ty) applied on img2, we will have

flow[:,:,0] += tx
flow[:,:,1] += ty

Scale

A scale applied on both img1 and img2 with a zoom parameters alpha multiplies the flow by the same amount

flow *= alpha

Rotation applied on both images

A rotation applied on both images by an angle theta also rotates flow vectors (flow[i,j]) by the same angle

\for_all i,j flow[i,j] = rotate(flow[i,j], theta)

rotate: x,y,theta ->  (x*cos(theta)-x*sin(theta), y*cos(theta), x*sin(theta))

Rotation applied on img2

Let us consider a rotation by the angle theta from the image center.

We must tranform each flow vector based on the coordinates where it lands. On each coordinate (i, j), we have:

flow[i, j, 0] += (cos(theta) - 1) * (j  - w/2 + flow[i, j, 0]) +    sin(theta)    * (i - h/2 + flow[i, j, 1])
flow[i, j, 1] +=   -sin(theta)    * (j  - w/2 + flow[i, j, 0]) + (cos(theta) - 1) * (i - h/2 + flow[i, j, 1])

flownetpytorch's People

Contributors

Stargazers

Watchers

Forkers

spillai benjamesbabala allensmile hyzcn martinxm chenbangfeng styleflow ml-lab diamondwu donghaoye yunzhongke lvzhaoyang crazyvertigo 3dmm-icme2023 awaelchli dakeli elevanth jacoblee121 yongduek lraxue ajaycharan grseb9s ozantezcan bereziat chenchr irwingd davidsonggithub computevision thinklib swinsey willdamon wanjinchang wpfhtl jidai-code shubhampachori12110095 helq2612 weeang763162 locussam lz10954 lz1095413168 yihanhu hli1221 yanwang2014 bit1002lst jiyongma bkvie jacke121 pinglmlcv zy20091082 wubaoyuan zchmac lengweiping1983 afcarl cbhanu 82magnolia 3togo yaolubrain bemoregt leoyouli yangwf1 mryang23 seattlegirl teeyohuang xyishere 894939677 yorsh87 indresh07 wujinlonglovezhangmiao1314 gengz fighttiger25 skyneta ken-ouyang cch2016 echoanran zhaihongjia chenxi840221 jgyoung33 saltwaterlhl hsveh eric-yyjau cl199443 shiyongde zgl1113 lovinghao tzzzni kevinkecc scscualex filippoaleotti ventusyue verigle jsczzzk cliff-bot thoang3 drsleep qjsqjs wikiwen jinpkj007 github-yjs ruyueshuo anir16293

flownetpytorch's Issues

Disparity / Flow normalization

Hi @ClementPinard, i have a small question. Why do you normalize disparities / flows by std=20. Where did you find this in the original code? Thank you!

questions about multiscaleloss

In the code here
But I did not find this in the paper, could me please tell me ? Thanks.

How can I get the 1.951 validation EPE with FlowNetS? (my EPE is 2.03)

Hi, I trained the FlowNetS with default options, but I can not get 1.951 validation EPE, and my EPE is 2.03. Is that reasonable? And is your split-value set to 0.8?

Thanks!
Shangchen

How to achieve Correlation1d layer???

The correlation_package is Correlation2d layer, How to achieve Correlation1d layer???Because of the disparity map

name 'spatial_correlation_sample' is not defined

When I trained the FlownetC, it occurs the error of "name 'spatial_correlation_sample' is not defined".
I was wondering where can I found the definition?

Why is this error occurring while running? RuntimeError: sizes must be non-negative (THCTensor_resizeNd at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCTensor.cpp:108)

My input is as follows:
a = t.ones(1, 2, 3, 4).float()
b = t.arange(-12, 12).view(1, 2, 3, 4).float()

cpu can be used, but gpu has error. RuntimeError: sizes must be non-negative (THCTensor_resizeNd at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCTensor.cpp:108)

AttributeError: 'numpy.ndarray' object has no attribute 'detach'

Hi,

I'm trying to run inference, however I get the following. Can anyone help me with this?

Cheers

python run_inference.py images/input flownets_EPE1.951.pth.tar --output images/output --img-exts jpg --upsampling nearest
/media/maria/0C7ED7537ED733E4/Downloads-/inspiration_code/FlowNetPytorch-master/models/util.py:11: ImportWarning: failed to load custom correlation modulewhich is needed for FlowNetC
"which is needed for FlowNetC", ImportWarning)
=> fetching img pairs in 'images/input'
=> will save everything to images/input/flow
1 samples found
=> using pre-trained model 'flownets'
0%| | 0/1 [00:00<?, ?it/s]/home/maria/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py:1890: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
Exception KeyError: KeyError(<weakref at 0x7f097f5b3f18; to 'tqdm' at 0x7f097f5a9590>,) in <bound method tqdm.del of 0%| | 0/1 [00:05<?, ?it/s]> ignored
Traceback (most recent call last):
File "run_inference.py", line 93, in
main()
File "/home/maria/anaconda2/lib/python2.7/site-packages/torch/autograd/grad_mode.py", line 46, in decorate_no_grad
return func(*args, **kwargs)
File "run_inference.py", line 87, in main
rgb_flow = flow2rgb(args.div_flow * flow_output.numpy(), max_value=args.max_flow)
File "/media/maria/0C7ED7537ED733E4/Downloads-/inspiration_code/FlowNetPytorch-master/main.py", line 340, in flow2rgb
flow_map_np = flow_map.detach().cpu().numpy()
AttributeError: 'numpy.ndarray' object has no attribute 'detach'

Loss in Pytorch and in Caffe

@ClementPinard Did you cross-checked losses during training in caffe and pytorch? Are they similar?
In my case (DispNetCorr1) in pytorch I have losses that are 100x larger than in caffe. I think I use the same averaging over batches and pixels as in caffe but they are still different. The weird thing is that I in case of DispNetCorr1 disparities are not normalized, i.e they are in range from [0 ... 250] so I expect to get high losses in the begging of training (in order of 10-100), but they are still small in caffe log.

Rotating flow twice

In file flow_transforms.py, I can notice flow being rotated twice (line 196 and 199-200). Since difference in rotation of two images is already accounted for, so rotating flow by theta just once should be sufficient, as assuming both files rotated by same amount net flow should not change?

Resume training/Fine tune

Using --pretrained to continue training on previously trained data set seems to overwrite results?
Is it meant to be used with --start-epoch to manually specify epoch to be restarted on?
Is is possible to fine tune weights on data sets? I.e pretrain on flying cars and then use these weights to fine tune on KITTI (similar to ImNet initialization of deep networks)?

Data Augmentation

Hey!

Ich checked your data augmentation. In random rotate there is

#flow vectors must be rotated too! careful about Y flow which is upside down
        target_=np.array(target, copy=True)
        target[:,:,0] = np.cos(angle1_rad)*target_[:,:,0] + np.sin(angle1_rad)*target_[:,:,1]
        target[:,:,1] = -np.sin(angle1_rad)*target_[:,:,0] + np.cos(angle1_rad)*target_[:,:,1]

but in RandomCropRotate it is

#flow vectors must be rotated too!
        target_=np.array(target, copy=True)
        target[:,:,0] = np.cos(angle1_rad)*target_[:,:,0] - np.sin(angle1_rad)*target_[:,:,1]
        target[:,:,1] = np.sin(angle1_rad)*target_[:,:,0] + np.cos(angle1_rad)*target_[:,:,1]

I guess the first one is correct?
Furthermore, if positive y flow is pointing downwards wouldn't you have to change the translation as well?
In RandomTranslate you do:

target[:,:,1]+= th

I guess if th is positive (you tranlsate upwards), your y flow would decrease?

What are your test scores on sintel (final, clean) or kitti?
Many thanks

training epoch size

Hi,

Thank you so much for your work.

One thing I am a bit confused about the training is what is the "epoch size"? Is it the iterations of each epoch?

Thank you so much. :)

split2list in util.py

There are two issues with split2list
(1) split2list does not support integer values as the split value. This cause only the default_split to be used.
(2) often we want the training dataset to be bigger. For this the condition of comparison with the random value should not be changed to <

def split2list(images, split, default_split=0.9):

# added this to handle integer values of split specified as percentage (see KITTI dataset)
if isinstance(split, int):
    split = split / 100.0

if isinstance(split, str):
    with open(split) as f:
        split_values = [x.strip() == '1' for x in f.readlines()]
    assert(len(images) == len(split_values))
elif isinstance(split, float):
    # changed to <
    split_values = np.random.uniform(0,1,len(images)) < split
else:
    # changed to <
    split_values = np.random.uniform(0,1,len(images)) < default_split
train_images = [sample for sample, split in zip(images, split_values) if split]
test_images = [sample for sample, split in zip(images, split_values) if not split]
return train_images, test_images

Question: multi-scale loss weighting

Hello,

In the Nvidia implementation of FlowNet2, they compute default weights for the multi-scale loss in the following manner:

self.loss_weights = torch.FloatTensor([(l_weight / 2 ** scale) for scale in range(self.numScales)])

This outputs tensor([0.3200, 0.1600, 0.0800, 0.0400, 0.0200]). This means that the highest-resolution flow is weighted most highly. In your implementation, the default loss weights are

weights = [0.005, 0.01, 0.02, 0.08, 0.32]  # as in original article

This will weight the lowest resolution most highly.

In the original paper by Fischer, Dosovitskiy, et al., they note that "as a training loss we use the endpoint error... It is the Euclidian distance between the predicted flow vector and the ground truth, averaged over all pixels." Paper here.. It makes no mention of multi-scale loss functions, or how they are weighted.

TL;DR: How should you weight the multi-scale loss, and from what research paper is this loss from?

Arbitray image pair in flownet

Based on flownet paper:

We do not have any fully connected layers, which allows the networks to take images of arbitrary size as input.

But it seems that if I input a different size image pair (not 384*512). The network will give an error.

When will release the implement of FlowNetC？

Loss function (summing up instead of averaging)

Hi Clément,

I just noticed that you are working on refactoring the source code!
Thanks a lot!

Btw, I am currently using a modified version of your previous implementation, and I found that a simple modification on the loss design produced better results.

In your loss implementation (multiscaleloss.py),
it first calculates the L2 distance (EPE) between the GT and output, and then it averages them over the image.
However, when just "summing up the EPE", the network performs roughly similar to the original implementation.
(Actually I got near 2.2 EPE in the FlyingChairs from this modification + some additional minor things.)

I haven't thoroughly checked the original FlowNetS implementation in Caffe, but when looking at the scale of the loss function of theirs, I thought that summing up the L2 loss over the image seems the way that the original implementation took.

Could you check whether this is the case?

Thanks,
Jun

Error when testing pre-trained Flownets netowrk

Hi,
I am trying to run the flownets model and predict the optical flow given a pair of images from the KITTI dataset. I downloaded the pretrained model, but when I run the prediction step it gives me the following error:

Traceback (most recent call last): File "flownet_test.py", line 74, in <module> output = model(Variable(torch.from_numpy(x_test))) File "torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "FlowNetPyTorch.py", line 95, in forward concat5 = torch.cat((out_conv5, out_deconv5, flow6_up), 1) File "python3.5/site-packages/torch/autograd/variable.py", line 897, in cat return Concat.apply(dim, *iterable) File "/python3.5/site-packages/torch/autograd/_functions/tensor.py", line 317, in forward return torch.cat(inputs, dim) RuntimeError: inconsistent tensor sizes at /pytorch/torch/lib/TH/generic/THTensorMath.c:2709

When I print the size of out_conv5, out_deconv5 and flow6_up it gives me:

torch.Size([1, 512, 3, 10]) torch.Size([1, 512, 4, 10]) torch.Size([1, 2, 4, 10])

The input tensor has the following dimension:
(1, 6, 94, 300)

Any idea about how to fix this?
Thanks

The enumeration on train_loader never stops

Hi, I‘m not clear why in a single epoch, the train_loader just didn't stop to perform validation process.
I set the batch size as 8, and epoch size to be 80. Then I assume each epoch will have 10 iterations.
But actually it just never stops (see below).
Is there any problem in for i, (input, target) in enumerate(train_loader): since we use the random sampling?
Thank you for any kind help.

`=> will save everything to mpi_sintel_both/Mon-Nov-13-12:29/flownets_bn,adam,90epochs,epochSize80,b8,lr0.0001
=> fetching img pairs in '/media/BEF0E2A7F0E2655D/MPI_Sintel/training/'
2080 samples found, 1664 train samples and 416 test samples
=> creating model 'flownets_bn'
=> setting adam solver
Epoch: [0][0/10] Time 31.513 (31.513) Data 7.474 (7.474) Loss 0.4387 (0.4387) EPE 13.288 (13.288)
Epoch: [0][10/10] Time 19.741 (21.515) Data 0.000 (1.319) Loss 0.4057 (0.4030) EPE 13.086 (12.616)
Epoch: [0][20/10] Time 19.748 (20.917) Data 0.000 (0.793) Loss 0.5682 (0.6196) EPE 14.902 (18.513)
Epoch: [0][30/10] Time 19.811 (20.659) Data 0.000 (0.616) Loss 0.3680 (0.6530) EPE 10.569 (19.120)
Epoch: [0][40/10] Time 20.125 (20.616) Data 0.000 (0.548) Loss 0.3345 (0.5855) EPE 10.309 (17.319)

Epoch: [0][50/10] Time 20.161 (20.466) Data 0.001 (0.440) Loss 0.3218 (0.5430) EPE 10.909 (16.359)`

How many epochs did your network train?

How many rounds did your network train?

Accuracy on flying chairs

Hi, there. Thanks for the great code. I was trying to play with it on flying chair dataset. But the EPE in the end is around 5.0, which is far from 2.7 as stated in the original paper. So did you try to train the code from scratch and what performance do you have? Maybe I did something wrong. Thanks a lot.

Sintel

Did you benchmark on Sintel? The official repository puts the files into subfiles alley, bamboo etc ...
Either I am not using it correctly or the mpisintel.py loader doesn't take that into account. Are you extracting all images or should I rewrite the mpisintel.py file to open multiple folders?

torch.load('flownets_pytorch.pth') doesn't work

I downloaded the pretrained flownet model for pytorch. However, when I execute torch.load('flownets_pytorch.pth'), I get InvalidHeaderError.
I was wondering if I am loading it incorrectly.

About warping.

Hi @ClementPinard, I am wondering about this code class RandomColorWarp(object) .Is it the same operation as caffe code:warping. If so, the (inputs, target) are frame_1 and predict_flow respectively, right? Thanks a lot!

about the training

Hi~
I have training my model with the same config as yours. But, the decay_loss decrease too faster, and after about 6000 iterations, the loss become nan.
So I want know your training configurations, are they the default configurations in your main.py?

flownet v1 or flownet v2

Hi, @ClementPinard ,

Which version does FlowNetPytorch match for?

flownet v1 or flownet v2

No dataaugmentation as in original flownet?

Have you tried to implement the dataAugmentation part of the original flownet?

Pretrained model

It seems that pretrained model flownets_EPE1.951.pth.tar can not be uncompressed correctly.

Output of run_inference.py

I use flownets_EPE1.951.pth.tar to test the examples below via run_inference.py

And get result like this

The image size of the result is 128 * 96 just 1/16 of the original image. And it does not look like the ground truth in Readme.md. Is there some thing wrong?

Training KITTI with sparse option

Hi,

I am developing my custom model based on your code.
When training KITTI dataset with your code, the sparse option influences the multiscale loss term (multiscaleEPE)
If a sparse term is active, output and target_var have different channels. Is it ok to forcibly set the dimension of a channel?

Thanks :)

On the contrary, How can I get the background image?

Hi, @Nikkou @ClementPinard @vra @bit1002lst @bkvie

On the contrary, How can I get the "unmoving background image"?

I want to get background image whithout objects from optical flow.

How can?

Thanks in advance.

from @bemoregt.

How to evaluate the flownets

Sorry to bother you again.
I just want to use the trained model to predict the optical flow. Could you share me the command only to predict.
Thanks very much.

Does the code deal with image width and height are not multipler of 64?

weights decay and constant biases

Hello! It seems that authors use weight decay 4e-4. It is used only for weight and not for biases, and only for contracting part. Have you tried it?

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "input"
  top: "conv1"
  param {
    lr_mult: 1
    **decay_mult: 1**
  }
  param {
    lr_mult: 1
    **decay_mult: 0**
  }
  convolution_param {
    num_output: 64
    pad: 3
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
    }
    engine: CUDNN
  }
}

KITTI flow reading is not correct

Thank you for this simple and readable code. I am also glad that this works with python 3.5 pytorch 0.3. I look forward to you adding other networks such as FlowNetC and FlowNet2.0. Also other metrics such as percentage outliers would be a great addition.

KITTI flow GT is sparse, but this is not considered in flow reading or in training. I suggest the following changes..

In KITTI.py, the ground truth flow reading is not correct. By looking at the KITTI flow reading script and the readme.txt there, this is what I wrote.

def load_flow_from_png(png_path):

# read using cv2 and convert from bgr to rgb
# scipy cannot handle 16 bit images, hence cv2 is used.
flo_img = cv2.imread(png_path,-1)
flo_img = flo_img[:,:,::-1].astype(float)

# see the readme file in KITTI devkit and the flow reader functions
mask = np.minimum(flo_img[:,:,2],1)
not_valid = (mask == 0)
valid = (mask != 0)
flo_img = flo_img[:, :, 0:2]
flo_img = flo_img - 32768
flo_img = flo_img / float(64.0)

# value 0 is used to indicate invalid flow.
# flow that is actually valid and zero is set to a very small value
eps = 1e-10
flo_img[np.abs(flo_img) < eps] = eps

# invalid flow is indicated by 0
flo_img[not_valid, :] = float(0.)
return flo_img

Apart from the above function, the sparse flag has to be passed into several functions. I added a flag called sparse_gt

if args.sparse_gt is None:
args.sparse_gt = ('KITTI' in args.dataset)

and this flag is passed to all the relevant functions such as: multiscaleEPE, one_scale, realELE, EPE etc.

With these changes, I am getting more meaningfull EPE values.

Kindly fix this issue.

models fail to decompress

Hello, i download the flownets_bn_EPE2.459.pth.tar and flownets_EPE1.951.pth.tar
And when i decompress it, the file is broken. Can you share the two files again?
Thanks very much.

Improve multi scale loss

The current multi scale never computes the loss at the native rsolution. This is because the highest resolution of FlowNetS (flow2) is smaller then the target or input resolution.

The problem is more sever in the case of sparse target (eg. KITTI) since we don't really use an accurate resampling of the of sparse target, but use max pooling instead. I believe this may be because pytorch may not have nearest neighbor resizing with support for flexible output sizes. Even if we had nearest neighbor resizing, that would be inaccurate too.

A quick fix for this would be to do a bilinear upsampling at the output for FlowNetS. Then the error on the first resolution would be computed without any resizing of target.

The following derivative of FlowNetS which I call as FlowNetUp does the same thing. There is really no need to define a new class, this can be incorporated into FlowNetS itself.

If the above line of reasoning is correct then this change should provide improved training and hence better accuracy.

file FlowNetSUp.py

import torch
import torch.nn as nn
from torch.nn.init import kaiming_normal
import math
from .FlowNetS import FlowNetS

all = [
'FlowNetSUp', 'flownets_up', 'flownets_up_bn'
]

class FlowNetSUp(FlowNetS):

def __init__(self,batchNorm=True):
    super(FlowNetSUp,self).__init__(batchNorm=batchNorm)

def forward(self, x):
    b, c, h, w = x.size()
    output = super(FlowNetSUp,self).forward(x)
    output2 = output[0] if type(output) in [tuple, list] else output
    output0 = nn.functional.upsample(output2, size=(h, w), mode='bilinear')
    output_up = [output0] + output[:-1] if (type(output) in [tuple, list]) else output0
    return output_up

def flownets_up(path=None):
model = FlowNetSUp(batchNorm=False)
if path is not None:
data = torch.load(path)
if 'state_dict' in data.keys():
model.load_state_dict(data['state_dict'])
else:
model.load_state_dict(data)
return model

def flownets_up_bn(path=None):
model = FlowNetSUp(batchNorm=True)
if path is not None:
data = torch.load(path)
if 'state_dict' in data.keys():
model.load_state_dict(data['state_dict'])
else:
model.load_state_dict(data)
return model

Error in norm dimension

Hi,
This could be me, but when you calculate your EPE, isn't the dimension you want to calculate the norm on 1, instead of 2, since your inputs are of size (size of batch, 2, x_size, y_size)?

def EPE(input_flow, target_flow):
return torch.norm(target_flow-input_flow,2,2)

Support for FlowNet2 models

Do you have any thoughts about integrating FlowNet2 into this repository?

FlowNet2 models (including the custom operations such as correlation) have become available for PyTorch 0.3, Python 3.x.
https://github.com/vt-vl-lab/pytorch_flownet2
https://github.com/hellock/flownet2-pytorch

I am asking this is because this repository is structured in a better way and is easier to use and modify.

Does this project implement the weight initialization?

Models cannot be decompression

I downloaded the model in the pytorch path,but I can not open the tar file.
#tar -xvf ***.tar
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

ImportError: /home/dell/.conda/envs/py3/lib/python3.6/site-packages/spatial_correlation_sampler_backend.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationESs

After the compilation is successful, the operation is wrong.

Traceback (most recent call last): File "/home/dell/device/FlowNetPytorch-master/models/util.py", line 3, in <module> from spatial_correlation_sampler import spatial_correlation_sample File "/home/dell/.conda/envs/py3/lib/python3.6/site-packages/spatial_correlation_sampler/__init__.py", line 1, in <module> from .spatial_correlation_sampler import SpatialCorrelationSampler, SpatialCorrelationSamplerFunction, spatial_correlation_sample File "/home/dell/.conda/envs/py3/lib/python3.6/site-packages/spatial_correlation_sampler/spatial_correlation_sampler.py", line 6, in <module> import spatial_correlation_sampler_backend as correlation ImportError: /home/dell/.conda/envs/py3/lib/python3.6/site-packages/spatial_correlation_sampler_backend.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationESs

about the kaiming initialization

Hello. In caffe the initialization method for conv's weight is msra, which using the fan_in mode, however in your code you use the fan_out mode.

for m in self.modules():
            if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2/n))
                if m.bias is not None:
                    m.bias.data.zero_()

as in caffe proto for msra filter:

message FillerParameter {
  // The filler type.
  optional string type = 1 [default = 'constant'];
  optional float value = 2 [default = 0]; // the value in constant filler
  optional float min = 3 [default = 0]; // the min value in uniform filler
  optional float max = 4 [default = 1]; // the max value in uniform filler
  optional float mean = 5 [default = 0]; // the mean value in Gaussian filler
  optional float std = 6 [default = 1]; // the std value in Gaussian filler
  // The expected number of non-zero output weights for a given input in
  // Gaussian filler -- the default -1 means don't perform sparsification.
  optional int32 sparse = 7 [default = -1];
  // Normalize the filler variance by fan_in, fan_out, or their average.
  // Applies to 'xavier' and 'msra' fillers.
  enum VarianceNorm {
    FAN_IN = 0;
    FAN_OUT = 1;
    AVERAGE = 2;
  }
  optional VarianceNorm variance_norm = 8 [default = FAN_IN];

  repeated float diag_val = 9;
}

the default model is FAN_IN, may be you should change the code m.out_channels to m.in_channels, or use nn.init.kaiming_normal

Reading/parsing of image pairs and flow fields

Hi,
Thanks for sharing the code. I was not able to locate where you have actually read the image pairs and .flo file for a given image pair. Can you please point me to the code part where such reading of images and parsing of the .flo file is done.

Thanks,
Avisek

Requiring converted caffe model of flownetc

I want the converted official model of flownetc, will you please share it with us?

Make it flexible to add more losses and choose loses via command line

I like the way models and datasets are arranged - makes this a great framework to extend for further work.

However, losses are not arranged that way. I wish if you would create a 'losses' folder for the losses and allow a list of losses (which will be added together) to be loaded by specifying it from command line.

As you might have read, unsupervised optical is reaching accuracy close to supervised optical flow estimation. "UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss"
https://arxiv.org/abs/1711.07837
This is an amazing work that deserves lot of attention.

If we make losses in this code base flexible, may be it will be easy to added other learning tasks such as UnFlow easily into this code base.

flowToRgb

How to understand the code? and is it different from the code in Sintel used by FlowNet?

PreTrained FlowNetS with BatchNormalization

Hi,

is there a pretrained pytorch model of the batch normalized version of FlownetS?
Thanks

scale on data augmentation

HI,thank you for your outstanding work! I have some questions on data augmentation.
I find you do not add scale and chromatic augmentation to data augmentation ,is it sacrifice performance on chairs or sintel?
And,the hyper-parameter used in data augmentation of you project is different from the original paper,for example, in the original flownet paper,it translates from a the range [−20%; 20%] of the image width for x and y; Is it important for the training?

pretrained flownetc

Are you going to release pretrained FlownetC model on FlyingChairs? Thank you.