xh-liu / cc-fpse Goto Github PK

View Code? Open in Web Editor NEW

128.0 128.0 15.0 39 KB

Code for NeurIPS 2019 paper "Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis"

Python 99.02% Shell 0.98%

cc-fpse's People

Stargazers

Watchers

Forkers

wenhuach cheeseintrap johndpope irfanicmll xrosliang rmrao kei97103 cv-ip yldcs peterouzh yonghoonkwon kei971031 gregbugaj yeimersilva-ap zhuyifan1993

cc-fpse's Issues

the channel number of fc in generator is not correct without vae

self.fc = nn.Conv2d(self.opt.semantic_nc, 16nfself.sw*self.sh, 3, padding=1)
=>self.fc = nn.Conv2d(self.opt.semantic_nc, 16 * nf, 3, padding=1)

Training errors

----------------- End -------------------
train.py --name coco_cc_fpse --mpdist --netG condconv --dist_url tcp://:8000 --num_servers 1 --netD fpse --lambda_feat 20 --dataset_mode coco --dataroot datasets/coco_stuff --batchSize 1 --niter 100 --niter_decay 100 --use_vae
Use GPU: 5 for training
Use GPU: 3 for training
Use GPU: 6 for training
Use GPU: 7 for training
Use GPU: 2 for training
Traceback (most recent call last):
File "train.py", line 97, in
main()
File "train.py", line 29, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, idx_server, opt))
File "/home/htang/anaconda3/envs/python3pytorch11/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/htang/anaconda3/envs/python3pytorch11/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 5 terminated with the following error:
Traceback (most recent call last):
File "/home/htang/anaconda3/envs/python3pytorch11/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/htang/projects/CC-FPSE/train.py", line 37, in main_worker
dist.init_process_group(backend='nccl', init_method=opt.dist_url, world_size=world_size, rank=rank)
File "/home/htang/anaconda3/envs/python3pytorch11/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 400, in init_process_group
store, rank, world_size = next(rendezvous(url))
File "/home/htang/anaconda3/envs/python3pytorch11/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 95, in _tcp_rendezvous_handler
store = TCPStore(result.hostname, result.port, world_size, start_daemon)
TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. torch.distributed.TCPStore(arg0: str, arg1: int, arg2: int, arg3: bool)

Invoked with: None, 8000, 8, False

Questions regarding mIoU and accuracy

Hi,

Thank you sharing the code and replying my previous question!
While reproducing the metrics, I have some questions:

I'm referring SPADE issue to implement evaluation code. Did you use same repo and pre-trained weight for evaluation?
If so, in regards to the COCO-Stuff dataset, original deeplab v2 shows 66.8 pixel accuracy and 39.1 mIoU score for ground truth validation images. However, CC-FPSE reaches 70.7 pixel accuracy and 41.6 mIoU score, which seems weird. I think the difference might come from the different input size to the deeplab model. How did you feed inputs to the deeplab network? (for example, use 256x256 image or upsampling 256x256 image to 321x321 with bilinear interpolation)

How important is the perceptual loss?

Hi @xh-liu, thanks for the awesome work.
I found that the perceptual loss cannot decrease during training, which has also been found in original GauGAN. I am wondering whether this loss is useful or not? Have you tried not using the perceptual loss for your method?

how to train the model in one GPU

Hello！I have a question，if i have only one server and only one GPU in the server, how should i set the argument to train the model?

Cant reproduce results, self.instance_path is empty. Please help me

Hello!
I try to run train with the command
train.py --name name --mpdist --netG condconv --dist_url tcp://0.0.0.0:8000 --num_servers 1 --netD fpse --lambda_feat 20 --dataset_mode custom --dataroot=path --batchSize 1 --niter 100 --niter_decay 100 --use_vae --ngpus_per_node 1 --label_dir=path --image_dir=path --checkpoints_dir=path --no_vgg_loss

And gettin the error I could not trace back:
Use GPU: 0 for training
dataset [CustomDataset] of size 14544 was created
Network [CondConvGenerator] was created. Total number of parameters: 135.6 million. To see the architecture, do print(network).
Network [FPSEDiscriminator] was created. Total number of parameters: 5.2 million. To see the architecture, do print(network).
Network [ConvEncoder] was created. Total number of parameters: 10.5 million. To see the architecture, do print(network).
create web directory path...
Traceback (most recent call last):
File "train.py", line 97, in
main()
File "train.py", line 29, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, idx_server, opt))
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/content/CC-FPSE/train.py", line 58, in main_worker
for i, data_i in enumerate(dataloader, start=iter_counter.epoch_iter):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/CC-FPSE/data/pix2pix_dataset.py", line 79, in getitem
instance_path = self.instance_paths[index]

How to avid that?
Also, there is no vgg19.pth can be found anywhere. What file supposed to be used?

The pretrained weights have some problem

Hello, we encounter some problem with your code about the paper.

The following is our problem.

The error is below:

The size mismatch for labelenc1.0.0.weight_orig: copying a param with shape torch.Size([64, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 183, 3, 3]).

test_coco.sh has already use the argument --use_vae

Thank you so much.

Bug fix for `models/pix2pix_model.py`

When running ./test_coco.sh, below error occurs.

Traceback (most recent call last):
  File "test.py", line 41, in <module>
    generated = model(data_i, mode='inference')
  File "/home/justin/.virtualenvs/spade/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/justin/project/CC-FPSE/models/pix2pix_model.py", line 35, in forward
    input_semantics, real_image = self.preprocess_input(data)
  File "/home/justin/project/CC-FPSE/models/pix2pix_model.py", line 120, in preprocess_input
    instance_edge_map = self.get_edges(inst_map)
  File "/home/justin/project/CC-FPSE/models/pix2pix_model.py", line 217, in get_edges
    edge[:, :, :, 1:] = edge[:, :, :, 1:] | (t[:, :, :, 1:] != t[:, :, :, :-1])
RuntimeError: Expected object of scalar type Byte but got scalar type Bool for argument #2 'other' in call to _th_or

It can be simply resolved by changing edge = self.ByteTensor(t.size()).zero_() to edge = self.ByteTensor(t.size()).zero_().bool() at https://github.com/xh-liu/CC-FPSE/blob/master/models/pix2pix_model.py#L216

Is depthwise seprable convolution correct?

Hello xh-liu

I have some question about your paper and code

When I read Xception(https://arxiv.org/abs/1610.02357) and find code about this
code of depthwise is have more parameter(group) like follow code

    self.depthwise = nn.Conv2d(nin, nin, kernel_size=kernel_size, padding=padding, **groups=nin**, bias=bias)
    self.pointwise = nn.Conv2d(nin, nout, kernel_size=1, bias=bias)

Can you explain about why this code and your code(In the depthwise conv part) is diffrent?

How can I train on a single GPU?

How can I train on 8 V100 GPUs?

Where the code should I change to train on 8 V100 GPUs? Thanks a lot.

network issue

I am using the nvidia docker container for pytorch-1912. I can clone the github repository without any problem, but when I try to run CC-FPSE on my own data (on a 4 GPU instance) :

python train.py --name condconv --netG condconv --netD fpse --lambda_feat 20 --dataset_mode custom --label_dir mydata/train_label --image_dir mydata/train_img --label_nc 6 --no_instance --batchSize 1 --niter 100 --niter_decay 100 --use_vae --ngpus_per_node 4

I get the following error :

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/uge_mnt/home/adeschem/CC-FPSE/train.py", line 37, in main_worker
dist.init_process_group(backend='nccl', init_method=opt.dist_url, world_size=world_size, rank=rank)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 397, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 109, in _tcp_rendezvous_handler
store = TCPStore(result.hostname, result.port, world_size, start_daemon)
RuntimeError: Network is unreachable

This seems to be related to torch distributed communication package, eventhough I am not using the --mpdist option to use distributed multiprocessing.

Using this for a custom task/dataset

Hi,
Thank you for your amazing work! I'm working on an image to image translation task, from sketches of faces to their corresponding photos. Would this work well on this task and type of data? If yes, how would I go about training with a custom dataset?
Thanks!

load pretrained weight for Module didn't work

Hallo,

i want to use the synthesis netzwork to generate the realistic images of dataset Cityscapes using semantic map. but the pretrained weight of the synthesis module didn't match.

hier is the error file:
./synbost-try/image_synthesis/util/util.py

#############################################################
line 227 net.load_state_dict(weights)

*** RuntimeError: Error(s) in loading state_dict for CondConvGenerator:
size mismatch for fc.weight: copying a param with shape torch.Size([32768, 256]) from checkpoint, the shape in current model is torch.Size([1024, 36, 3, 3]).
size mismatch for fc.bias: copying a param with shape torch.Size([32768]) from checkpoint, the shape in current model is torch.Size([1024]).
#############################################################

Do you knwo the reason? I didn't change any other parameters expect the path of checkpoints.

Thank you for your help.

Best Regards
Yiru

xh-liu / cc-fpse Goto Github PK

cc-fpse's People

Stargazers

Watchers

Forkers

cc-fpse's Issues

Recommend Projects

Recommend Topics

Recommend Org