mchong6 / gansnroses Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 151.0 32.49 MB

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

License: MIT License

Python 0.80% Jupyter Notebook 99.20%

gansnroses's People

Contributors

Stargazers

Watchers

Forkers

satoshirobatofujimoto shuidong jaysurplus loretoparisi ricklentz n1ckfg victor8733 watson1101 ak391 ml-lab ishine dumpmemory glockii wengbenjue youngallien tisma midnight93 zeta1999 kehoffman3 oxyhyxo wesleyru trendingtechnology tiamat-tech khue3012 yohannawaliya quattroporte616 githubcrj cvlinks learningpro road0001 xchlu greatfeel elmiar0642 niluwin robinson2006 oo2316oo github30 jonyhuang tonywork wyp19930313 dut3062796s yotofu lawrendran gradient-ai loganlxb jinwook-shim hercules261188 aimeta-pro nikolausn gregorysharkov ossdc bayrem-ben amitmankikar bugroom wjgaas lingjun3033 lianshuailong king52311 peterzhousz quangduytran lori-kuo dnichyparuk enoimsg baodijun snoopybingo leerock deidril nachovazquez98 cedro3 qiboyuanhaha shenghuixue wuhaozede 1124687554 ethan-jiang-1 yazangharaibeh artaxerces silenzio777 pgmct qhfan arfafax flyarong xiciliu yi-shi94 udelshao beyonehan tungrg kang-hana undercontroller nlmay videovip liuguoyou lawrence880301 neuroidss joeyjoejoejuni0r jaedukseo zevarela moyu-fan vongcanhchi lxholding may1106

gansnroses's Issues

Realtime demo using OSSDC VisionAI platform

@mchong6 thanks for open sourcing this project! It is really fun!

See here a realtime demo using your implementation:

Have fun with GANsNRoses - using OSSDC VisionAI realtime video processing platform
https://www.youtube.com/watch?v=YZTzjk_qh4w

More details in video description.
It takes less than 5 min to run it in Google Colab with realtime video streamed from any Android 4.2.2 phone/tablet/media player camera.

about pretrained mode to finetune

Hello,
I try to use the pretrained model that you provided to finetune, but when I use train script to set the ckpt, it tell me like this:
2021/07/08 20:26:17 File "train.py", line 377, in
Loading model from: /opt/conda/lib/python3.7/site-packages/lpips/weights/v0.1/vgg.pth
182021/07/08 20:26:17 G_A2B.load_state_dict(ckpt['G_A2B'])
192021/07/08 20:26:17 Unexpected key(s) in state_dict: "encoder.stem.5.conv1.0.weight", "encoder.stem.5.conv1.1.bias", "encoder.stem.5.conv2.0.weight", "encoder.stem.5.conv2.1.bias", "encoder.stem.4.skip.0.kernel", "encoder.stem.4.skip.1.weight", "encoder.stem.4.conv2.2.bias", "encoder.stem.4.conv2.0.kernel", "encoder.stem.4.conv2.1.weight", "encoder.style.3.weight", "encoder.style.3.bias", "convs.6.conv.weight", "convs.6.conv.blur.kernel", "convs.6.conv.modulation.weight", "convs.6.conv.modulation.bias", "convs.6.activate.bias", "convs.7.conv.weight", "convs.7.conv.modulation.weight", "convs.7.conv.modulation.bias", "convs.7.activate.bias", "to_rgbs.3.bias", "to_rgbs.3.upsample.kernel", "to_rgbs.3.conv.weight", "to_rgbs.3.conv.modulation.weight", "to_rgbs.3.conv.modulation.bias".
202021/07/08 20:26:17 size mismatch for convs.0.conv.weight: copying a param with shape torch.Size([1, 512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 256, 512, 3, 3]).
212021/07/08 20:26:17 RuntimeError: Error(s) in loading state_dict for Generator:
222021/07/08 20:26:17 size mismatch for encoder.style.4.weight: copying a param with shape torch.Size([8, 512]) from checkpoint, the shape in current model is torch.Size([512, 8192]).
232021/07/08 20:26:17 size mismatch for convs.0.activate.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
242021/07/08 20:26:17 size mismatch for convs.1.conv.modulation.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
252021/07/08 20:26:17 size mismatch for convs.2.conv.modulation.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
262021/07/08 20:26:17 size mismatch for convs.1.conv.modulation.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
272021/07/08 20:26:17 size mismatch for convs.2.conv.weight: copying a param with shape torch.Size([1, 256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 128, 256, 3, 3]).
282021/07/08 20:26:17 size mismatch for convs.3.conv.weight: copying a param with shape torch.Size([1, 256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 128, 128, 3, 3]).
292021/07/08 20:26:17 size mismatch for convs.2.activate.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
302021/07/08 20:26:17 size mismatch for convs.3.conv.modulation.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
312021/07/08 20:26:17 size mismatch for convs.4.conv.modulation.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([128, 512]).
322021/07/08 20:26:17 size mismatch for convs.5.conv.weight: copying a param with shape torch.Size([1, 128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 64, 64, 3, 3]).
332021/07/08 20:26:17 size mismatch for convs.5.activate.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
342021/07/08 20:26:17 size mismatch for convs.3.activate.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
352021/07/08 20:26:17 size mismatch for to_rgbs.0.conv.weight: copying a param with shape torch.Size([1, 3, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 3, 256, 1, 1]).
362021/07/08 20:26:17 size mismatch for convs.5.conv.modulation.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([64, 512]).
372021/07/08 20:26:17 size mismatch for convs.4.conv.modulation.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
382021/07/08 20:26:17 size mismatch for to_rgbs.1.conv.modulation.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]).
392021/07/08 20:26:17 size mismatch for to_rgbs.0.conv.modulation.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
402021/07/08 20:26:17 size mismatch for to_rgbs.2.conv.modulation.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
412021/07/08 20:26:17 size mismatch for to_rgbs.1.conv.weight: copying a param with shape torch.Size([1, 3, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 3, 128, 1, 1]).
422021/07/08 20:26:17 size mismatch for to_rgbs.2.conv.weight: copying a param with shape torch.Size([1, 3, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 3, 64, 1, 1]).
432021/07/08 20:26:17 size mismatch for convs.2.conv.modulation.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
442021/07/08 20:26:17 category=DeprecationWarning,
452021/07/08 20:26:17 Missing key(s) in state_dict: "encoder.stem.4.conv2.0.weight", "encoder.stem.4.conv2.1.bias", "encoder.style.2.0.kernel", "encoder.style.2.1.weight", "encoder.style.2.2.bias", "encoder.style.5.weight", "encoder.style.5.bias".
462021/07/08 20:26:17 size mismatch for convs.3.conv.modulation.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([128, 512]).
472021/07/08 20:26:17 size mismatch for convs.4.conv.weight: copying a param with shape torch.Size([1, 128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 64, 128, 3, 3]).
482021/07/08 20:26:17 size mismatch for convs.1.conv.weight: copying a param with shape torch.Size([1, 512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1, 256, 256, 3, 3]).
492021/07/08 20:26:17 size mismatch for convs.1.activate.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
502021/07/08 20:26:17 size mismatch for to_rgbs.1.conv.modulation.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([128, 512]).
512021/07/08 20:26:17 size mismatch for to_rgbs.2.conv.modulation.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([64, 512]).
522021/07/08 20:26:17 File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
532021/07/08 20:26:17 size mismatch for to_rgbs.0.conv.modulation.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 512]).
542021/07/08 20:26:17 Traceback (most recent call last):
552021/07/08 20:26:17 size mismatch for encoder.style.4.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([512]).
562021/07/08 20:26:17 size mismatch for convs.4.activate.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
572021/07/08 20:26:17 size mismatch for convs.5.conv.modulation.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([64]).
how can i fix this? maybe strict=False?

License

great work, please add a license file

Only for girls?

Thanks share your greate work!
When I try input a man's image, the result is :

So, this olny for girls?

What is the training performance?

This is great work!

I've been looking at many similar models for a business problem. Yours looks most promising, but I've noticed that information about the training performance was not included in the paper. How well does it perform, and would you consider it "efficient" on hardware?

Thanks,
Tyler

Full model for transfer learning?

The model released in the notebook is only 300MB and seems to only include the final A2B and B2A ema generators, so it can only be used for inference.

Is it possible to get ahold of a full pretrained checkpoint for fine tuning/transfer learning? Retraining from scratch will take several weeks on my hardware.

Is there an easy way to lighten the model?

I have a video card with 12 GB of memory, but I can train only with batch size 1, if try increase batch 1 I get an error CUDA out of memory

Sample dataset to train the model

Thank you very much,
But could you provide us with several sample images to train the model?

ONNX support

Hi, thanks for your great work. Can this model be transferred to ONNX format?

About D_L in the code

Thanks for the great work, the code seems didn't add D_L to train()?

def train(args, trainA_loader, trainB_loader, testA_loader, testB_loader, G_A2B, G_B2A, D_A, D_B, G_optim, D_optim, device):
    G_A2B.train(), G_B2A.train(), D_A.train(), D_B.train()

只支持女生嘛？ Is only supported girls ?

My input is a boy and the generated picture is a girl？

Have you compared the cartoon and Disney filter effects in SnapChat？

The Disney filter in Snapchat is very very very stable for video input, but I have no idea how they make it so stable. Since your method works for videos too, I wonder if you have any clue?

About the StyleGAN model code

Hi , Thanks for your great work , i want to know the StyleGAN model is your version? I mean the code is written by yourself?

Unpaired image traslation from MRI to CT

Dear Developer,
first, thank you for your great job.
We want to create some CT scans from MRI scans.
Could you please let me know if it is possible to use this package for this domain transfer?
Best regards,
Javad

memory consumption

hello,
It seems that it takes a huge amount of memory to train this network,
first I tried to train this network on 2080Ti with 12GB memory, it always crashed and said CUDA out of memory error,
then I tried on v100 and it worked. It takes about 17GB memory to train this network, is it too much?

RuntimeError: AUBIO ERROR: source_wavread: Failed opening ./samples/dsm.mp4 (could not find RIFF header)

Hi, when I run in " https://colab.research.google.com/github/mchong6/GANsNRoses/blob/main/inference_colab.ipynb " is alright!

But, run in my machine, error "RuntimeError: AUBIO ERROR: source_wavread: Failed opening ./samples/dsm.mp4 (could not find RIFF header)"

And I had try:
aubio/aubio#111
Unfortunately, after these commands I still get the same error.
Could you share the detail about Virtual environment setup. Or how to fix?Thanks

Demo

Thanks for your incredible work, i want to ask someing else. Where is the GIF In the README.md from?

Does it work with cat?

Asking for science

Questions about shuffle style

Excellent work！I would like to make it work on mobilephone. But，When I read the code(train.py), I have two questions:

why there need to shuffle origin img batch:
A = aug(ori_A[[np.random.randint(args.batch)]].expand_as(ori_A))
B = aug(ori_B[[np.random.randint(args.batch)]].expand_as(ori_B))
Won't the shuffle of style lead to mismatch? fake_A2B2A(c1, s1) != A(c1, s2), Cycle Consistency Loss maybe unsatisfied?
fake_A2B2A = G_B2A.decode(A2B2A_content, shuffle_batch(A2B_style))
fake_B2A2B = G_A2B.decode(B2A2B_content, shuffle_batch(B2A_style))

Measure FID

Hello sir, love your work. I have a question for you: When you measured FID, how many image did you generated ?

about choosing latent dimension

Hi, thanks for your great work!

I have a question about the latent dimension setting.
Is there any specific reason to choose latent dimension to 8?

The base setting of stylegan's latent dimension is 512.
And in just my opinion, dimension 8 is not enough to embed style.
Could you explain why you set it up like that?

About part code of the test class in Train.py

Thanks for your great work .
there are some code in the test class :

 if i % 2 == 0:
                A2B_mod1 = torch.randn([1, args.latent_dim]).cuda()
                B2A_mod1 = torch.randn([1, args.latent_dim]).cuda()
                A2B_mod2 = torch.randn([1, args.latent_dim]).cuda()
                B2A_mod2 = torch.randn([1, args.latent_dim]).cuda()

i want to konw the meaning of the random sampling. And why only set these variable when i % 2 == 0?

/home/user/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py:3: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative
uses
  import imp
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /home/user/anaconda3/lib/python3.7/site-packages/lpips/weights/v0.1/vgg.pth
  0%|                                                                                                                                                                           | 0/300000 [00:00<?, ?it/s]/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py:3063: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
  0%|                                                                                                                                                                           | 0/300000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 465, in <module>
    train(args, trainA_loader, trainB_loader, testA_loader, testB_loader, G_A2B, G_B2A, D_A, D_B, G_optim, D_optim, device)
  File "train.py", line 167, in train
    A2B_content, A2B_style = G_A2B.encode(A)
  File "/home/user/mnt/develoment/code/GANsNRoses/model.py", line 501, in encode
    return self.encoder(input)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/user/mnt/develoment/code/GANsNRoses/model.py", line 703, in forward
    style = self.style(act)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/user/mnt/develoment/code/GANsNRoses/model.py", line 179, in forward
    out = F.linear(input, self.weight * self.scale)
  File "/home/user/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1692, in linear
    output = input.matmul(weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0