Giter VIP home page Giter VIP logo

pytorch-fcn's Introduction

pytorch-fcn

PyPI Version Python Versions GitHub Actions

PyTorch implementation of Fully Convolutional Networks.

Requirements

Installation

git clone https://github.com/wkentaro/pytorch-fcn.git
cd pytorch-fcn
pip install .

# or

pip install torchfcn

Training

See VOC example.

Accuracy

At 10fdec9.

Model Implementation epoch iteration Mean IU Pretrained Model
FCN32s Original - - 63.63 Download
FCN32s Ours 11 96000 62.84
FCN16s Original - - 65.01 Download
FCN16s Ours 11 96000 64.91
FCN8s Original - - 65.51 Download
FCN8s Ours 7 60000 65.49
FCN8sAtOnce Original - - 65.40
FCN8sAtOnce Ours 11 96000 64.74

Visualization of validation result of FCN8s.

Cite This Project

If you use this project in your research or wish to refer to the baseline results published in the README, please use the following BibTeX entry.

@misc{pytorch-fcn2017,
  author =       {Ketaro Wada},
  title =        {{pytorch-fcn: PyTorch Implementation of Fully Convolutional Networks}},
  howpublished = {\url{https://github.com/wkentaro/pytorch-fcn}},
  year =         {2017}
}

pytorch-fcn's People

Contributors

lyken17 avatar mbarnes1 avatar wkentaro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-fcn's Issues

About the Preprocessing of input images.

This project used different preprocessing method with the Pytorch pretrained models. The pretrained models use RGB values normalized to [0, 1]. While in pytorch-fcn, images are BGR values and not normalized to [0, 1], only center normalized. So how did it works?

See this on pytorch discuss:
All pretrained torchvision models have the same preprocessing, which is to normalize using the following mean/std values: https://github.com/pytorch/examples/blob/master/imagenet/main.py#L92-L93164 (input is RGB format)
[ref: https://discuss.pytorch.org/t/how-to-preprocess-input-for-pre-trained-networks/683/2]

"import fcn" error

Hi, sorry about disturbing you, i got an error when run voc/train_fcn322.py, i can't import fcn, is that i need addition package? Thanks.

Nan error while training

Hello, when I was training the network after tuning the lr=1e4 which is the same as in FCN paper?
Why does it raise a nan error in loss? (it is ok if I use the lr set by your original script)

How to run the code

Hi, can you provide simple instructions about how to train, do inference and get dataset?

ImportError: No module named fcn

Hi, when I run the file "model_caffe_to_pytorch.py", it turns out that there is "no module named fcn". Thank you very much.
By the way, "export PYTHONAPTH=$(pwd)/python:$PYTHONPATH" should be "export PYTHONPATH=$(pwd)/python:$PYTHONPATH", I think.

Can not work with pycaffe?

Hello! I am using your pytorch-fcn, which need the dependency fcn. However, when I try to convert the caffe model to pytorch one, importing both fcn and caffe the kernel become dead. But just importint single one, caffe or fcn, it works. If I import fcn after importing caffe or caffe after fcn, it can not work...

AttributeError: 'float' object has no attribute 'total_seconds'

Hi,wkentrao. sorry to bother u !
I want to train fcn32s , like this.

./train_fcn32s.py -g 0

Now,i have a error. I dont know how to fixed that.

Traceback (most recent call last):
File "./train_fcn32s.py", line 164, in
main()
File "./train_fcn32s.py", line 160, in main
trainer.train()
File "/home/wukuan/anaconda3/envs/env27/lib/python2.7/site-packages/torchfcn/trainer.py", line 222, in train
self.train_epoch()
File "/home/wukuan/anaconda3/envs/env27/lib/python2.7/site-packages/torchfcn/trainer.py", line 208, in train_epoch
elapsed_time = elapsed_time.total_seconds()
AttributeError: 'float' object has no attribute 'total_seconds'

I want to know some details about this error.
Thanks and Best Regards!

FCN8s

Hi,

May I ask that when you update FCN8s or FCN 16s?

Thanks!

'Dropout' object has no attribute 'weight'

Hi,
Thanks so much for your help! I'm new to pytorch.
There is a new error again!
Traceback (most recent call last):
File "examples/voc/train_fcn32s.py", line 100, in
main()
File "examples/voc/train_fcn32s.py", line 56, in main
model.copy_params_from_vgg16(vgg16, init_upscore=False)
File "/home/zheshiyige/Desktop/fully convolutional network/pytorch-fcn-master/torchfcn/models/fcn32s.py", line 117, in copy_params_from_vgg16
l2.weight.data = l1.weight.data.view(l2.weight.size())
File "/home/zheshiyige/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 238, in getattr
type(self).name, name))
AttributeError: 'Dropout' object has no attribute 'weight'

Thanks and Best Regards,

A little bug in trainer.py

Hello, wkentaro. Do you find that the training loss will suddenly become larger when a new epoch starts? I've noticed that and think it is because of the incorrect use of the training and evaluating mode of the model.
In train_epoch(), line 171, when you finish validation, I think you should change the model back into training mode.

About the offset of crop

Hi, I just want to ask how you decide the crop offset in different models. If I want to train this model on different datasets with different image size, how to calculate the precise crop offset?

Discussion about the loss function

Hi, I did an experiment about the loss function:

log_p = log_p.transpose(1, 2).transpose(2, 3).contiguous().view(-1, c)

I find the code segment with view(-1, c) or without both works.Why? In my opinion, the code without view(-1, c) is the true one. Or, in the memory, the both are actually the same. How about your idea?

slower benchmarks

I tested this out on an AWS p2 instance and I got a significantly slower benchmark. Can you confirm the hardware that you ran your benchmark on?

==> Benchmark: gpu=0, times=1000, dynamic_input=False
==> Testing FCN32s with PyTorch
Elapsed time: 245.73 [s / 1000 evals]
Hz: 4.07 [hz]

A bug in loss computation

Hi author

First I would like to thank for your sharing. During training, I notice the loss (around 30000) is obviously larger than fcn experiment on caffe(0.4 ~ 3.0). After reading your code, I find the cause. In your code trainer.py,

    loss = F.nll_loss(log_p, target, weight=weight, size_average=False)
    if size_average:
        loss /= mask.sum().data[0]

Note here mask is a torch.ByteTensor, which means the data is stored in a byte, value from 0~255 (8 bits). So when you sum it up, it will easily lead to overflow (because answer is also stored in torch.ByteTensor).

For example

test1 = Variable(torch.ones(1, 256,256).type('torch.ByteTensor'))
print(test1, test1.sum())

'''
Variable containing:
( 0 ,.,.) = 
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
     ...       ⋱       ...    
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
[torch.ByteTensor of size 1x256x256]
 Variable containing:
 0
[torch.ByteTensor of size 1]
'''

The correct way should be

test1 = Variable(torch.ones(1, 256,256).type('torch.ByteTensor'))
print(test1, test1.data.sum())

'''
Variable containing:
( 0 ,.,.) = 
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
     ...       ⋱       ...    
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
   1   1   1  ...    1   1   1
[torch.ByteTensor of size 1x256x256]
 65536
'''

So to compute the loss properly, you need to change loss /= mask.sum().data[0] to loss /= mask.data.sum().

I have created a pull request to fix this.

Unknown skimage warning

Hi,
Thanks a lot for your help! Now, the code can run successfully on my computer. But there is a warnning, I'm not sure whether it is an important issue.

/home/zheshiyige/anaconda2/lib/python2.7/site-packages/fcn-5.8.1-py2.7.egg/fcn/utils.py:310: RuntimeWarning: invalid value encountered in true_divide
/home/zheshiyige/.local/lib/python2.7/site-packages/skimage/util/dtype.py:122: UserWarning: Possible precision loss when converting from float64 to uint8
.format(dtypeobj_in, dtypeobj_out))
/home/zheshiyige/.local/lib/python2.7/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
Train epoch=0: 7%| | 611/8498 [02:25<31:02, 4.24it/s]/home/zheshiyige/anaconda2/lib/python2.7/site-packages/fcn-5.8.1-py2.7.egg/fcn/utils.py:312: RuntimeWarning: invalid value encountered in true_divide

Thanks and Best Regards,

Gradients for Loss function

Hi,

I have a question for loss function,

loss = F.nll_loss(log_p, target, weight=weight, size_average=False) if size_average: loss /= mask.sum().data[0]

Whether 'size_average' is True or False, scales of gradients for the loss function are different.

This may be a problem because different scales of gradients may impact on how we set learning rate.

Is this the reason you use very low learning rate, 1e-10?

Thanks!!

load_state_dict error

Hi,
Recently, I run the examples/voc/train_fcn32s.py, and encountered another error:
Traceback (most recent call last):
File "train_fcn32s.py", line 101, in
main()
File "train_fcn32s.py", line 55, in main
vgg16.load_state_dict(torch.load(pth_file))
File "/home/zheshiyige/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 331, in load_state_dict
.format(name))
KeyError: 'unexpected key "classifier.1.weight" in state_dict'

I have download the /home/zheshiyige/data/models/torch/vgg16-00b39a1b.pth

Could you tell me how to fix this error?

Thanks and Best Regards,

BC Support

Hi WKentaro,

Thank you for the implementation! I am new to PyTorch and I will try this out for an upcoming project.

Does your implementation include Bottleneck (BC) blocks? It seems to improve training time significantly.

  • FC

get error when executing the script "learning_curve.py"?

Hello. I found that when the training time exceed 24 hours, the logging will write "1 day" to the time field of "log.csv" . However, this will cause error when using the script "learning_curve.py". How can I fix this error? Thanks!

resume error

hi, when i using resume to load parameters, it is crash saying ''unexpected key "module.features.0.weight" in state_dict''. I don't know what wrong with this, Thanks a lot!

**Training**

code-block:: bash

./train_fcn32s.py config/001.yaml

Running this step, I can see this error, as follows,
Traceback (most recent call last):
File "./train_fcn32s.py", line 128, in
main()
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/xyz/anaconda2/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "./train_fcn32s.py", line 62, in main
cfg, out = load_config_file(config_file)
File "./train_fcn32s.py", line 32, in load_config_file
name += '_VCS-%s' % git_hash()
File "./train_fcn32s.py", line 21, in git_hash
hash = subprocess.check_output(shlex.split(cmd)).strip()
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 212, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 390, in init
errread, errwrite)
File "/home/xyz/anaconda2/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Wishing for your reply!

Incosistent Tensor size while loading data using data loader.

I am training an OCR using RNN. I am supplying input data as word images of varying dimensions (since each word can be of different lengths) and the size of class labels of each input data is also not consistent. Since each word can have a different number of characters.

   tensor_word_dataset = WordImagesDataset(images, truths, transform = ToTensor())
   dataset_loader = torch.utils.data.DataLoader(tensor_word_dataset,
                                            batch_size=16, shuffle=True,) 

This gives me the error:
RuntimeError: inconsistent tensor size at /py/conda-bld/pytorch_1493673470840/work/torch/lib/TH/generic/THTensorCopy.c:46

The image sizes of first 5 input labels and images respectively are:

 torch.Size([2]) torch.Size([32, 41])
 torch.Size([7]) torch.Size([32, 95])
 torch.Size([2]) torch.Size([32, 38])
 torch.Size([2]) torch.Size([32, 53])
 torch.Size([2]) torch.Size([32, 49])
 torch.Size([6]) torch.Size([32, 55])

Any suggestions as to how should I fix it. I want to shuffle the data and send it in batches instead of supplying them one at a time.

ModuleNotFoundError: No module named 'v1'

$ ./train_fcn32s.py
Traceback (most recent call last):
	File "./train_fcn32s.py", line 9, in <module>
		import torchfcn
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/__init__.py", line 1, in <module>
		from . import datasets  # NOQA
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/datasets/__init__.py", line 1, in <module>
		from .apc import APC2016V1  # NOQA
	File "/home/acgtyrant/Projects/pytorch-fcn/.env/lib/python3.6/site-packages/torchfcn-1.3-py3.6.egg/torchfcn/datasets/apc/__init__.py", line 1, in <module>
		from v1 import APC2016V1  # NOQA
ModuleNotFoundError: No module named 'v1'

resuming from checkpoint error

When resuming from a saved checkpoint using the flag --resume and the path to .pth.tar file saved under the 'log' folder by the program, there seems to be an error during the optim step.

Traceback (most recent call last):
File "train_fcn32s.py", line 202, in
main()
File "train_fcn32s.py", line 198, in main
trainer.train()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torchfcn/trainer.py", line 286, in train
self.train_epoch()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torchfcn/trainer.py", line 245, in train_epoch
self.optim.step()
File "/home/arunirc/pytorch-fcn/lib/python2.7/site-packages/torch/optim/sgd.py", line 87, in step
param_state = self.state[p]
KeyError: Parameter containing:
(0 ,0 ,.,.) =
-0.1587 0.0404 -0.2275
-0.1916 -0.1479 0.0287
-0.0271 -0.3107 -0.1193

speedtest

Hi,

Thanks for sharing the code. I wonder that what kind of GPU were you using for speed test? When I ran the speed test on GTX 1080 (8G memory), the Elapsed time for chainer is: 404.87 [s / 1000 evals] and the for pytorch, it is 178.93 [s / 1000 evals]. And whenever I try to run it with the VOC dataset, it has the out of memory error, so I wonder if you can share your experiment environment, such as GPU type, how long it runs and how much memory fcn32s_pytorch takes.

Thank you!

The learning rate is too small

I found that when you trained the FCN8s model, the learning rate is too small (1e-14). I remember that the learning rate is set to 1e-4 in the original FCN paper. I am a little confused.
Can you give me some answer? Thank you for advance

Why do we need padding=100 for a filter of size 3?

As the title, in torchfcn/models/fcn32s.py we have the setting for the first conv1 layer:

nn.Conv2d(3, 64, 3, padding=100),

Why do we need a padding of side length 100 instead of 1 according to the filter size 3?

Thanks

AttributeError: ' module object has no attribute 'utils'

Hi,
So sorry to bother you again!
Now, there is a new error
Traceback (most recent call last):
File "voc/train_fcn32s.py", line 100, in
main()
File "voc/train_fcn32s.py", line 56, in main
torchfcn.utils.copy_params_vgg16_to_fcn32s(vgg16, model, init_upscore=False)
AttributeError: 'module' object has no attribute 'utils'

   But I found that I have installed torchfcn (1.4.1) and can import torchfcn.

Thanks and Best Regards,

How to install the module named 'fcn'?

Now after I ran python setup.py install, I got the following error:

import torchfcn
Traceback (most recent call last):

  File "<ipython-input-1-c08454750c97>", line 1, in <module>
    import torchfcn

  File "/home/zhang/anaconda2/lib/python2.7/site-packages/torchfcn-0.2-py2.7.egg/torchfcn/__init__.py", line 3, in <module>
    from trainer import Trainer  # NOQA

  File "/home/zhang/anaconda2/lib/python2.7/site-packages/torchfcn-0.2-py2.7.egg/torchfcn/trainer.py", line 6, in <module>
    import fcn

ImportError: No module named fcn****

Mini-batch and Multiple-gpu

Hi,

I want to use batch size more than 1.

So, I changed batch_size to 4.
e.g.)
train_loader = torch.utils.data.DataLoader(
torchfcn.datasets.SBDClassSeg(root, split='train', transform=True),
batch_size=4, shuffle=True, **kwargs)

But, it raises an error so that I could not use it.

Could you please please let me know how to modify it?

and, is it possible to use multiple gpus by modifying codes?

Thank you!

Where can I get the pretrained vgg16 model

The error tells me that I don't have the vgg16-from-caffe pretrained model.

bg455@51d9f43122cd:~/projects/pytorch-fcn/examples/voc$ ./train_fcn32s.py ./config/001.yaml
Traceback (most recent call last):
File "./train_fcn32s.py", line 128, in
main()
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/bg455/.local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "./train_fcn32s.py", line 93, in main
vgg16 = torchfcn.models.VGG16(pretrained=True)
File "/usr/local/lib/python2.7/dist-packages/torchfcn-1.5.0-py2.7.egg/torchfcn/models/vgg.py", line 10, in VGG16
state_dict = torch.load(model_file)
File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 246, in load
f = open(f, 'rb')
IOError: [Errno 2] No such file or directory: '/home/wkentaro/data/models/torch/vgg16-from-caffe.pth'

About VGG16 pretrained model

Hi @wkentaro , I see that you use the vgg16 pretrained model from caffe. I wonder that why you do not use a pretrained one provided by torchvision. Is the pretrained one from caffe better?

Thanks.

Pretrained network for fcn8s model

Hi, I tried to train the fcn8s model using pretrained weights from the vgg16 model, but there was no visible learning and the model output was all zero for the first to ~10k iterations. However, the training loss converges when I use the fcn16s model to initialize the weights. I could not really understand the reason behind this, can you help with an explanation.

Inconsistent Tensor Size

When I implement the VOC2012 Dataset based your code, it occurs an error "incosistent tensor size". The code about dataset is same with yours. But I add some test code as below:

if __name__ == '__main__':
    dst = VOCDataSet("./data", is_transform=True)
    trainloader = data.DataLoader(dst, batch_size=4)
    for i, data in enumerate(trainloader):
        imgs, labels = data
        print(imgs[0].type())

Did you encounter this problem?

strange error unknown

Hi,
When I run the examples/voc/train_fcn32s.py. There is a strange error
File "train_fcn32s.py", line 46, in main
model = torchfcn.models.FCN32s(n_class=21, deconv=deconv)
TypeError: init() got an unexpected keyword argument 'deconv'
could you tell me what's wrong with this error?
Thanks and Best Regards,

`--out` option

I was trying to run train in the VOC Example with log but this parameter doesn't exist!
noticed it will output log no regardless of parameter

Not found fcn.utils.

Hi,
I cannot find 'fcn.utils' function, which is used in 'trainer.py' for fcn.utils.label_accuracy_score and fcn.utils.visualize_segmentation.

RuntimeWarning: invalid value encountered in divide during example training

In the first round of evaluation (with pretrained VGG16 model) there is following error

.../pytorch-fcn/examples/voc/torchfcn/utils.py:24: RuntimeWarning: invalid value encountered in divide
  acc_cls = np.diag(hist) / hist.sum(axis=1)                                    
.../pytorch-fcn/examples/voc/torchfcn/utils.py:26: RuntimeWarning: invalid value encountered in divide
  iu = np.diag(hist) / (hist.sum(axis=1) + hist.sum(axis=0) - np.diag(hist))

and the training loss becomes NaN. Is this expected?

Also I am not sure if it's related but AFTER the first round all the validation error becomes NaN.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.