wkentaro / pytorch-fcn Goto Github PK

View Code? Open in Web Editor NEW

1.7K 28.0 480.0 32.75 MB

PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

License: MIT License

Python 100.00%

pytorch computer-vision deep-learning semantic-segmentation convolutional-networks fcn fcn8s

pytorch-fcn's Introduction

pytorch-fcn

PyTorch implementation of Fully Convolutional Networks.

Requirements

pytorch >= 0.2.0
torchvision >= 0.1.8
fcn >= 6.1.5
Pillow
scipy
tqdm

Installation

git clone https://github.com/wkentaro/pytorch-fcn.git
cd pytorch-fcn
pip install .

# or

pip install torchfcn

Training

See VOC example.

Accuracy

At 10fdec9.

Model	Implementation	epoch	iteration	Mean IU	Pretrained Model
FCN32s	Original	-	-	63.63	Download
FCN32s	Ours	11	96000	62.84
FCN16s	Original	-	-	65.01	Download
FCN16s	Ours	11	96000	64.91
FCN8s	Original	-	-	65.51	Download
FCN8s	Ours	7	60000	65.49
FCN8sAtOnce	Original	-	-	65.40
FCN8sAtOnce	Ours	11	96000	64.74

Visualization of validation result of FCN8s.

Cite This Project

If you use this project in your research or wish to refer to the baseline results published in the README, please use the following BibTeX entry.

@misc{pytorch-fcn2017,
  author =       {Ketaro Wada},
  title =        {{pytorch-fcn: PyTorch Implementation of Fully Convolutional Networks}},
  howpublished = {\url{https://github.com/wkentaro/pytorch-fcn}},
  year =         {2017}
}

pytorch-fcn's People

Contributors

Stargazers

Watchers

Forkers

jdc08161063 chagge wanjinchang sunjieee gwnudt acgtyrant adrianhust kirk86 richard-chau chenbangfeng hyzcn cosmmb kyocen josipd ml-lab uzeful bookerdewitt marvis smn2010 csgwon volzkzg deepiano todawn youngfly11 weigq cshaoping chenchr wheatdog luciolis rgaonkar xlovelace jiangweixian mrochan inkimage nolsigan foxet grseb9s lxj0276 amenbo huangzhii eivindeb yangyf0419 davidtranno1 jeong-tae foreverfei kekedan liu3xing3long tiffany940107 skyhowie25 mrteera yougoforward jin-linhao hbcbh1999 mattkleinsmith feixian15 mbarnes1 mukosame xy0806 willdamon sinianyutian g380909685 sanketloke linpingchuan isnot2bad thezino tdong7 codes-kzhan aust-hansen ywwang2013 alainouyang queenie88 zhengyk11 diliu1992 siyecao99999 emmasrh yanlei03 hangtongluo chrisliu54 zetianxiao yuyangyg amwons ywxkjtsdzy dros1986 yliu134 ievn2015 xxlxsyhl stargazeryuan chuckgithub haomengchao zllrunning zhou13 wangjianyuweg afcarl bjchen666 tangal0203 liudaizong sixitingting andrewchiyz rizkywellyanto oujieww

pytorch-fcn's Issues

About the Preprocessing of input images.

This project used different preprocessing method with the Pytorch pretrained models. The pretrained models use RGB values normalized to [0, 1]. While in pytorch-fcn, images are BGR values and not normalized to [0, 1], only center normalized. So how did it works?

See this on pytorch discuss:
All pretrained torchvision models have the same preprocessing, which is to normalize using the following mean/std values: https://github.com/pytorch/examples/blob/master/imagenet/main.py#L92-L93164 (input is RGB format)
[ref: https://discuss.pytorch.org/t/how-to-preprocess-input-for-pre-trained-networks/683/2]

"import fcn" error

Hi, sorry about disturbing you, i got an error when run voc/train_fcn322.py, i can't import fcn, is that i need addition package? Thanks.

Nan error while training

Hello, when I was training the network after tuning the lr=1e4 which is the same as in FCN paper?
Why does it raise a nan error in loss? (it is ok if I use the lr set by your original script)

Support FCN32s copied from Caffe

How to run the code

Hi, can you provide simple instructions about how to train, do inference and get dataset?

ImportError: No module named fcn

Hi, when I run the file "model_caffe_to_pytorch.py", it turns out that there is "no module named fcn". Thank you very much.
By the way, "export PYTHONAPTH=$(pwd)/python:$PYTHONPATH" should be "export PYTHONPATH=$(pwd)/python:$PYTHONPATH", I think.

Questions about the parameters in the upscore layer

Hi @wkentaro ,your code is running will on my own dataset but there are some parts that I don't quite understand.
In fcn32s.py line 144 you set
h = h[:, :, 19:19 + x.size()[2], 19:19 + x.size()[3]].contiguous().
Why you use 19 here or I can just replace it with 20 or 18?
Thanks very much.

Can not work with pycaffe?

Hello! I am using your pytorch-fcn, which need the dependency fcn. However, when I try to convert the caffe model to pytorch one, importing both fcn and caffe the kernel become dead. But just importint single one, caffe or fcn, it works. If I import fcn after importing caffe or caffe after fcn, it can not work...

AttributeError: 'float' object has no attribute 'total_seconds'

Hi,wkentrao. sorry to bother u !
I want to train fcn32s , like this.

./train_fcn32s.py -g 0

Now,i have a error. I dont know how to fixed that.

Traceback (most recent call last):
File "./train_fcn32s.py", line 164, in
main()
File "./train_fcn32s.py", line 160, in main
trainer.train()
File "/home/wukuan/anaconda3/envs/env27/lib/python2.7/site-packages/torchfcn/trainer.py", line 222, in train
self.train_epoch()
File "/home/wukuan/anaconda3/envs/env27/lib/python2.7/site-packages/torchfcn/trainer.py", line 208, in train_epoch
elapsed_time = elapsed_time.total_seconds()
AttributeError: 'float' object has no attribute 'total_seconds'

I want to know some details about this error.
Thanks and Best Regards!

FCN8s

Hi,

May I ask that when you update FCN8s or FCN 16s?

Thanks!

'Dropout' object has no attribute 'weight'

Hi,
Thanks so much for your help! I'm new to pytorch.
There is a new error again!
Traceback (most recent call last):
File "examples/voc/train_fcn32s.py", line 100, in
main()
File "examples/voc/train_fcn32s.py", line 56, in main
model.copy_params_from_vgg16(vgg16, init_upscore=False)
File "/home/zheshiyige/Desktop/fully convolutional network/pytorch-fcn-master/torchfcn/models/fcn32s.py", line 117, in copy_params_from_vgg16
l2.weight.data = l1.weight.data.view(l2.weight.size())
File "/home/zheshiyige/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 238, in getattr
type(self).name, name))
AttributeError: 'Dropout' object has no attribute 'weight'

Thanks and Best Regards,

A little bug in trainer.py

Hello, wkentaro. Do you find that the training loss will suddenly become larger when a new epoch starts? I've noticed that and think it is because of the incorrect use of the training and evaluating mode of the model.
In train_epoch(), line 171, when you finish validation, I think you should change the model back into training mode.

About the offset of crop

Hi, I just want to ask how you decide the crop offset in different models. If I want to train this model on different datasets with different image size, how to calculate the precise crop offset?

Discussion about the loss function

Hi, I did an experiment about the loss function:

log_p = log_p.transpose(1, 2).transpose(2, 3).contiguous().view(-1, c)

I find the code segment with view(-1, c) or without both works.Why? In my opinion, the code without view(-1, c) is the true one. Or, in the memory, the both are actually the same. How about your idea?

slower benchmarks

I tested this out on an AWS p2 instance and I got a significantly slower benchmark. Can you confirm the hardware that you ran your benchmark on?

==> Benchmark: gpu=0, times=1000, dynamic_input=False
==> Testing FCN32s with PyTorch
Elapsed time: 245.73 [s / 1000 evals]
Hz: 4.07 [hz]

A bug in loss computation

Hi author

First I would like to thank for your sharing. During training, I notice the loss (around 30000) is obviously larger than fcn experiment on caffe(0.4 ~ 3.0). After reading your code, I find the cause. In your code trainer.py,

    loss = F.nll_loss(log_p, target, weight=weight, size_average=False)
    if size_average:
        loss /= mask.sum().data[0]

Note here mask is a torch.ByteTensor, which means the data is stored in a byte, value from 0~255 (8 bits). So when you sum it up, it will easily lead to overflow (because answer is also stored in torch.ByteTensor).

For example

test1 = Variable(torch.ones(1, 256,256).type('torch.ByteTensor'))
print(test1, test1.sum())

'''
Variable containing:
( 0 ,.,.) = 
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
     ...       ⋱       ...    
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
[torch.ByteTensor of size 1x256x256]
 Variable containing:
 0
[torch.ByteTensor of size 1]
'''

The correct way should be

test1 = Variable(torch.ones(1, 256,256).type('torch.ByteTensor'))
print(test1, test1.data.sum())

'''
Variable containing:
( 0 ,.,.) = 
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
     ...       ⋱       ...    
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
[torch.ByteTensor of size 1x256x256]
 65536
'''

So to compute the loss properly, you need to change loss /= mask.sum().data[0] to loss /= mask.data.sum().

I have created a pull request to fix this.

Unknown skimage warning

Hi,
Thanks a lot for your help! Now, the code can run successfully on my computer. But there is a warnning, I'm not sure whether it is an important issue.

/home/zheshiyige/anaconda2/lib/python2.7/site-packages/fcn-5.8.1-py2.7.egg/fcn/utils.py:310: RuntimeWarning: invalid value encountered in true_divide
/home/zheshiyige/.local/lib/python2.7/site-packages/skimage/util/dtype.py:122: UserWarning: Possible precision loss when converting from float64 to uint8
.format(dtypeobj_in, dtypeobj_out))
/home/zheshiyige/.local/lib/python2.7/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
Train epoch=0: 7%| | 611/8498 [02:25<31:02, 4.24it/s]/home/zheshiyige/anaconda2/lib/python2.7/site-packages/fcn-5.8.1-py2.7.egg/fcn/utils.py:312: RuntimeWarning: invalid value encountered in true_divide

Thanks and Best Regards,

Gradients for Loss function

Hi,

I have a question for loss function,

loss = F.nll_loss(log_p, target, weight=weight, size_average=False) if size_average: loss /= mask.sum().data[0]

Whether 'size_average' is True or False, scales of gradients for the loss function are different.

This may be a problem because different scales of gradients may impact on how we set learning rate.

Is this the reason you use very low learning rate, 1e-10?

Thanks!!

load_state_dict error

Hi,
Recently, I run the examples/voc/train_fcn32s.py, and encountered another error:
Traceback (most recent call last):
File "train_fcn32s.py", line 101, in
main()
File "train_fcn32s.py", line 55, in main
vgg16.load_state_dict(torch.load(pth_file))
File "/home/zheshiyige/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 331, in load_state_dict
.format(name))
KeyError: 'unexpected key "classifier.1.weight" in state_dict'

I have download the /home/zheshiyige/data/models/torch/vgg16-00b39a1b.pth

Could you tell me how to fix this error?

Thanks and Best Regards,

Support weight/bias-wise learning rate setting

As in https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/voc-fcn32s/net.py#L8

BC Support

Hi WKentaro,

Thank you for the implementation! I am new to PyTorch and I will try this out for an upcoming project.

Does your implementation include Bottleneck (BC) blocks? It seems to improve training time significantly.

Which paper vision does this program follow?

Journal version or conference version?

an issue about train_data

hi man~ sorry for bothering u
I can not get the http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/sema
ntic_contours/benchmark.tgz -O benchmark.tar for the benchmark data.
Could u please upload the files to your project?
thx!

get error when executing the script "learning_curve.py"?

Hello. I found that when the training time exceed 24 hours, the logging will write "1 day" to the time field of "log.csv" . However, this will cause error when using the script "learning_curve.py". How can I fix this error? Thanks!

resume error

hi, when i using resume to load parameters, it is crash saying ''unexpected key "module.features.0.weight" in state_dict''. I don't know what wrong with this, Thanks a lot!

Use seg11val.txt of orignal paper

Training

code-block:: bash

./train_fcn32s.py config/001.yaml

Running this step, I can see this error, as follows,
Traceback (most recent call last):
File "./train_fcn32s.py", line 128, in
main()
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "./train_fcn32s.py", line 62, in main
cfg, out = load_config_file(config_file)
File "./train_fcn32s.py", line 32, in load_config_file
name += '_VCS-%s' % git_hash()
File "./train_fcn32s.py", line 21, in git_hash
hash = subprocess.check_output(shlex.split(cmd)).strip()
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 212, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 390, in init
errread, errwrite)
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Wishing for your reply!

Can only deal with batch size as 1?

Improve weight initialization

As in https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/voc-fcn32s/solve.py#L25

Incosistent Tensor size while loading data using data loader.

I am training an OCR using RNN. I am supplying input data as word images of varying dimensions (since each word can be of different lengths) and the size of class labels of each input data is also not consistent. Since each word can have a different number of characters.

   tensor_word_dataset = WordImagesDataset(images, truths, transform = ToTensor())
   dataset_loader = torch.utils.data.DataLoader(tensor_word_dataset,
                                            batch_size=16, shuffle=True,)

This gives me the error:
RuntimeError: inconsistent tensor size at /py/conda-bld/pytorch_1493673470840/work/torch/lib/TH/generic/THTensorCopy.c:46

The image sizes of first 5 input labels and images respectively are:

 torch.Size([2]) torch.Size([32, 41])
 torch.Size([7]) torch.Size([32, 95])
 torch.Size([2]) torch.Size([32, 38])
 torch.Size([2]) torch.Size([32, 53])
 torch.Size([2]) torch.Size([32, 49])
 torch.Size([6]) torch.Size([32, 55])

Any suggestions as to how should I fix it. I want to shuffle the data and send it in batches instead of supplying them one at a time.

ModuleNotFoundError: No module named 'v1'

$ ./train_fcn32s.py
Traceback (most recent call last):
	File "./train_fcn32s.py", line 9, in <module>
		import torchfcn
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/__init__.py", line 1, in <module>
		from . import datasets  # NOQA
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/datasets/__init__.py", line 1, in <module>
		from .apc import APC2016V1  # NOQA
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/datasets/apc/__init__.py", line 1, in <module>
		from v1 import APC2016V1  # NOQA
ModuleNotFoundError: No module named 'v1'

resuming from checkpoint error

When resuming from a saved checkpoint using the flag --resume and the path to .pth.tar file saved under the 'log' folder by the program, there seems to be an error during the optim step.

Traceback (most recent call last):
File "train_fcn32s.py", line 202, in
main()
File "train_fcn32s.py", line 198, in main
trainer.train()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torchfcn/trainer.py", line 286, in train
self.train_epoch()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torchfcn/trainer.py", line 245, in train_epoch
self.optim.step()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torch/optim/sgd.py", line 87, in step
param_state = self.state[p]
KeyError: Parameter containing:
(0 ,0 ,.,.) =
-0.1587 0.0404 -0.2275
-0.1916 -0.1479 0.0287
-0.0271 -0.3107 -0.1193

speedtest

Hi,

Thanks for sharing the code. I wonder that what kind of GPU were you using for speed test? When I ran the speed test on GTX 1080 (8G memory), the Elapsed time for chainer is: 404.87 [s / 1000 evals] and the for pytorch, it is 178.93 [s / 1000 evals]. And whenever I try to run it with the VOC dataset, it has the out of memory error, so I wonder if you can share your experiment environment, such as GPU type, how long it runs and how much memory fcn32s_pytorch takes.

Thank you!

The learning rate is too small

I found that when you trained the FCN8s model, the learning rate is too small (1e-14). I remember that the learning rate is set to 1e-4 in the original FCN paper. I am a little confused.
Can you give me some answer? Thank you for advance

Is there implementation of FCN16s and FCN8s

Hi,
I can only find fcn32s implementation in VOC example, is there fcn16s and fcn8s implementation?

Thanks and Best Regards,

how to provide the three arguments in train_fcn32s.py

Hi,
So sorry to bother you again! Could you tell me what's the meaning of the three arguments, out, resume and no-deconv? How to provide these three arguments to make it run properly?

Thanks and Best Regards,

Could you plz add more running instructions on /example/voc/train_fcn32s.py

As the title, thanks. It seems that

./train_fcn32s.py --out logs/fcs32s_sbd

does not work out with the current version.

Best,

Why do we need padding=100 for a filter of size 3?

As the title, in torchfcn/models/fcn32s.py we have the setting for the first conv1 layer:

nn.Conv2d(3, 64, 3, padding=100),

Why do we need a padding of side length 100 instead of 1 according to the filter size 3?

Thanks

AttributeError: ' module object has no attribute 'utils'

Hi,
So sorry to bother you again!
Now, there is a new error
Traceback (most recent call last):
File "voc/train_fcn32s.py", line 100, in
main()
File "voc/train_fcn32s.py", line 56, in main
torchfcn.utils.copy_params_vgg16_to_fcn32s(vgg16, model, init_upscore=False)
AttributeError: 'module' object has no attribute 'utils'

   But I found that I have installed torchfcn (1.4.1) and can import torchfcn.

Thanks and Best Regards,

How to install the module named 'fcn'?

Now after I ran python setup.py install, I got the following error:

import torchfcn
Traceback (most recent call last):

  File "<ipython-input-1-c08454750c97>", line 1, in <module>
    import torchfcn

  File "/home/zhang/anaconda2/lib/python2.7/site-packages/torchfcn-0.2-py2.7.egg/torchfcn/__init__.py", line 3, in <module>
    from trainer import Trainer  # NOQA

  File "/home/zhang/anaconda2/lib/python2.7/site-packages/torchfcn-0.2-py2.7.egg/torchfcn/trainer.py", line 6, in <module>
    import fcn

ImportError: No module named fcn****

Mini-batch and Multiple-gpu

Hi,

I want to use batch size more than 1.

So, I changed batch_size to 4.
e.g.)
train_loader = torch.utils.data.DataLoader(
torchfcn.datasets.SBDClassSeg(root, split='train', transform=True),
batch_size=4, shuffle=True, **kwargs)

But, it raises an error so that I could not use it.

Could you please please let me know how to modify it?

and, is it possible to use multiple gpus by modifying codes?

Thank you!

Where can I get the pretrained vgg16 model

The error tells me that I don't have the vgg16-from-caffe pretrained model.

bg455@51d9f43122cd:~/projects/pytorch-fcn/examples/voc$ ./train_fcn32s.py ./config/001.yaml
Traceback (most recent call last):
File "./train_fcn32s.py", line 128, in
main()
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "./train_fcn32s.py", line 93, in main
vgg16 = torchfcn.models.VGG16(pretrained=True)
File "/usr/local/lib/python2.7/dist-packages/torchfcn-1.5.0-py2.7.egg/torchfcn/models/vgg.py", line 10, in VGG16
state_dict = torch.load(model_file)
File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 246, in load
f = open(f, 'rb')
IOError: [Errno 2] No such file or directory: '/home/wkentaro/data/models/torch/vgg16-from-caffe.pth'

About VGG16 pretrained model

Hi @wkentaro , I see that you use the vgg16 pretrained model from caffe. I wonder that why you do not use a pretrained one provided by torchvision. Is the pretrained one from caffe better?

Thanks.

Pretrained network for fcn8s model

Hi, I tried to train the fcn8s model using pretrained weights from the vgg16 model, but there was no visible learning and the model output was all zero for the first to ~10k iterations. However, the training loss converges when I use the fcn16s model to initialize the weights. I could not really understand the reason behind this, can you help with an explanation.

Inconsistent Tensor Size

When I implement the VOC2012 Dataset based your code, it occurs an error "incosistent tensor size". The code about dataset is same with yours. But I add some test code as below:

if __name__ == '__main__':
    dst = VOCDataSet("./data", is_transform=True)
    trainloader = data.DataLoader(dst, batch_size=4)
    for i, data in enumerate(trainloader):
        imgs, labels = data
        print(imgs[0].type())

Did you encounter this problem?

strange error unknown

Hi,
When I run the examples/voc/train_fcn32s.py. There is a strange error
File "train_fcn32s.py", line 46, in main
model = torchfcn.models.FCN32s(n_class=21, deconv=deconv)
TypeError: init() got an unexpected keyword argument 'deconv'
could you tell me what's wrong with this error?
Thanks and Best Regards,

.../pytorch-fcn/examples/voc/torchfcn/utils.py:24: RuntimeWarning: invalid value encountered in divide
  acc_cls = np.diag(hist) / hist.sum(axis=1)                                    
.../pytorch-fcn/examples/voc/torchfcn/utils.py:26: RuntimeWarning: invalid value encountered in divide
  iu = np.diag(hist) / (hist.sum(axis=1) + hist.sum(axis=0) - np.diag(hist))

and the training loss becomes NaN. Is this expected?

Also I am not sure if it's related but AFTER the first round all the validation error becomes NaN.

wkentaro / pytorch-fcn Goto Github PK

pytorch-fcn's Introduction

pytorch-fcn

Requirements

Installation

Training

Accuracy

Cite This Project

pytorch-fcn's People

Contributors

Stargazers

Watchers

Forkers

pytorch-fcn's Issues

Recommend Projects

Recommend Topics

Recommend Org