xh-liu / cc-fpse Goto Github PK
View Code? Open in Web Editor NEWCode for NeurIPS 2019 paper "Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis"
Code for NeurIPS 2019 paper "Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis"
self.fc = nn.Conv2d(self.opt.semantic_nc, 16nfself.sw*self.sh, 3, padding=1)
=>self.fc = nn.Conv2d(self.opt.semantic_nc, 16 * nf, 3, padding=1)
----------------- End -------------------
train.py --name coco_cc_fpse --mpdist --netG condconv --dist_url tcp://:8000 --num_servers 1 --netD fpse --lambda_feat 20 --dataset_mode coco --dataroot datasets/coco_stuff --batchSize 1 --niter 100 --niter_decay 100 --use_vae
Use GPU: 5 for training
Use GPU: 3 for training
Use GPU: 6 for training
Use GPU: 7 for training
Use GPU: 2 for training
Traceback (most recent call last):
File "train.py", line 97, in
main()
File "train.py", line 29, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, idx_server, opt))
File "/home/htang/anaconda3/envs/python3pytorch11/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/htang/anaconda3/envs/python3pytorch11/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 5 terminated with the following error:
Traceback (most recent call last):
File "/home/htang/anaconda3/envs/python3pytorch11/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/htang/projects/CC-FPSE/train.py", line 37, in main_worker
dist.init_process_group(backend='nccl', init_method=opt.dist_url, world_size=world_size, rank=rank)
File "/home/htang/anaconda3/envs/python3pytorch11/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 400, in init_process_group
store, rank, world_size = next(rendezvous(url))
File "/home/htang/anaconda3/envs/python3pytorch11/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 95, in _tcp_rendezvous_handler
store = TCPStore(result.hostname, result.port, world_size, start_daemon)
TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. torch.distributed.TCPStore(arg0: str, arg1: int, arg2: int, arg3: bool)
Invoked with: None, 8000, 8, False
Hi,
Thank you sharing the code and replying my previous question!
While reproducing the metrics, I have some questions:
I'm referring SPADE issue to implement evaluation code. Did you use same repo and pre-trained weight for evaluation?
If so, in regards to the COCO-Stuff dataset, original deeplab v2 shows 66.8 pixel accuracy and 39.1 mIoU score for ground truth validation images. However, CC-FPSE reaches 70.7 pixel accuracy and 41.6 mIoU score, which seems weird. I think the difference might come from the different input size to the deeplab model. How did you feed inputs to the deeplab network? (for example, use 256x256 image or upsampling 256x256 image to 321x321 with bilinear interpolation)
Hi @xh-liu, thanks for the awesome work.
I found that the perceptual loss cannot decrease during training, which has also been found in original GauGAN. I am wondering whether this loss is useful or not? Have you tried not using the perceptual loss for your method?
Hello!I have a question,if i have only one server and only one GPU in the server, how should i set the argument to train the model?
Hello!
I try to run train with the command
train.py --name name --mpdist --netG condconv --dist_url tcp://0.0.0.0:8000 --num_servers 1 --netD fpse --lambda_feat 20 --dataset_mode custom --dataroot=path --batchSize 1 --niter 100 --niter_decay 100 --use_vae --ngpus_per_node 1 --label_dir=path --image_dir=path --checkpoints_dir=path --no_vgg_loss
And gettin the error I could not trace back:
Use GPU: 0 for training
dataset [CustomDataset] of size 14544 was created
Network [CondConvGenerator] was created. Total number of parameters: 135.6 million. To see the architecture, do print(network).
Network [FPSEDiscriminator] was created. Total number of parameters: 5.2 million. To see the architecture, do print(network).
Network [ConvEncoder] was created. Total number of parameters: 10.5 million. To see the architecture, do print(network).
create web directory path...
Traceback (most recent call last):
File "train.py", line 97, in
main()
File "train.py", line 29, in main
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, idx_server, opt))
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/content/CC-FPSE/train.py", line 58, in main_worker
for i, data_i in enumerate(dataloader, start=iter_counter.epoch_iter):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/CC-FPSE/data/pix2pix_dataset.py", line 79, in getitem
instance_path = self.instance_paths[index]
How to avid that?
Also, there is no vgg19.pth can be found anywhere. What file supposed to be used?
Hello, we encounter some problem with your code about the paper.
The following is our problem.
The error is below:
The size mismatch for labelenc1.0.0.weight_orig: copying a param with shape torch.Size([64, 184, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 183, 3, 3]).
test_coco.sh has already use the argument --use_vae
Thank you so much.
When running ./test_coco.sh
, below error occurs.
Traceback (most recent call last):
File "test.py", line 41, in <module>
generated = model(data_i, mode='inference')
File "/home/justin/.virtualenvs/spade/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/justin/project/CC-FPSE/models/pix2pix_model.py", line 35, in forward
input_semantics, real_image = self.preprocess_input(data)
File "/home/justin/project/CC-FPSE/models/pix2pix_model.py", line 120, in preprocess_input
instance_edge_map = self.get_edges(inst_map)
File "/home/justin/project/CC-FPSE/models/pix2pix_model.py", line 217, in get_edges
edge[:, :, :, 1:] = edge[:, :, :, 1:] | (t[:, :, :, 1:] != t[:, :, :, :-1])
RuntimeError: Expected object of scalar type Byte but got scalar type Bool for argument #2 'other' in call to _th_or
It can be simply resolved by changing edge = self.ByteTensor(t.size()).zero_()
to edge = self.ByteTensor(t.size()).zero_().bool()
at https://github.com/xh-liu/CC-FPSE/blob/master/models/pix2pix_model.py#L216
Hello xh-liu
I have some question about your paper and code
When I read Xception(https://arxiv.org/abs/1610.02357) and find code about this
code of depthwise is have more parameter(group) like follow code
self.depthwise = nn.Conv2d(nin, nin, kernel_size=kernel_size, padding=padding, **groups=nin**, bias=bias)
self.pointwise = nn.Conv2d(nin, nout, kernel_size=1, bias=bias)
Can you explain about why this code and your code(In the depthwise conv part) is diffrent?
Where the code should I change to train on 8 V100 GPUs? Thanks a lot.
I am using the nvidia docker container for pytorch-1912. I can clone the github repository without any problem, but when I try to run CC-FPSE on my own data (on a 4 GPU instance) :
python train.py --name condconv --netG condconv --netD fpse --lambda_feat 20 --dataset_mode custom --label_dir mydata/train_label --image_dir mydata/train_img --label_nc 6 --no_instance --batchSize 1 --niter 100 --niter_decay 100 --use_vae --ngpus_per_node 4
I get the following error :
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/uge_mnt/home/adeschem/CC-FPSE/train.py", line 37, in main_worker
dist.init_process_group(backend='nccl', init_method=opt.dist_url, world_size=world_size, rank=rank)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 397, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 109, in _tcp_rendezvous_handler
store = TCPStore(result.hostname, result.port, world_size, start_daemon)
RuntimeError: Network is unreachable
This seems to be related to torch distributed communication package, eventhough I am not using the --mpdist option to use distributed multiprocessing.
Hi,
Thank you for your amazing work! I'm working on an image to image translation task, from sketches of faces to their corresponding photos. Would this work well on this task and type of data? If yes, how would I go about training with a custom dataset?
Thanks!
Hallo,
i want to use the synthesis netzwork to generate the realistic images of dataset Cityscapes using semantic map. but the pretrained weight of the synthesis module didn't match.
hier is the error file:
./synbost-try/image_synthesis/util/util.py
#############################################################
line 227 net.load_state_dict(weights)
*** RuntimeError: Error(s) in loading state_dict for CondConvGenerator:
size mismatch for fc.weight: copying a param with shape torch.Size([32768, 256]) from checkpoint, the shape in current model is torch.Size([1024, 36, 3, 3]).
size mismatch for fc.bias: copying a param with shape torch.Size([32768]) from checkpoint, the shape in current model is torch.Size([1024]).
#############################################################
Do you knwo the reason? I didn't change any other parameters expect the path of checkpoints.
Thank you for your help.
Best Regards
Yiru
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.