I've been struggling to get the demo experiment to work. When I run the code, I get the following Runtime error:
Network [ModulateGenerator] was created. Total number of parameters: 90.1 million. To see the architecture, do print(network).
Embedding size is 512, encoder SAP.
Network [ResSESyncEncoder] was created. Total number of parameters: 10.4 million. To see the architecture, do print(network).
Network [FanEncoder] was created. Total number of parameters: 14.3 million. To see the architecture, do print(network).
Network [ResNeXtEncoder] was created. Total number of parameters: 38.0 million. To see the architecture, do print(network).
Pretrained network G has fewer layers; The following are not initialized:
['conv1', 'convs', 'style', 'to_rgb1', 'to_rgbs']
model [AvModel] was created
working
dataset [VOXTestDataset] of size 361 was created
0%| | 0/181 [00:00<?, ?it/s]C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\functional.py:3328: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\functional.py:3458: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode)
0%| | 0/181 [00:04<?, ?it/s]
Traceback (most recent call last):
File "C:/Users/Admin/Documents/Github/Talking-Face_PC-AVS/app/inference.py", line 107, in main
inference_single_audio(opt, path_label, model)
File "C:/Users/Admin/Documents/Github/Talking-Face_PC-AVS/app/inference.py", line 66, in inference_single_audio
fake_image_original_pose_a, fake_image_driven_pose_a = model.forward(data_i, mode='inference')
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\av_model.py", line 94, in forward
driving_pose_frames)
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\av_model.py", line 484, in inference
fake_image_ref_pose_a, _ = self.generate_fake(sel_id_feature, ref_merge_feature_a)
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\av_model.py", line 448, in generate_fake
fake_image, style_rgb = self.netG(style)
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\networks\generator.py", line 583, in forward
out = self.conv1(out, latent[:, 0], noise=noise[0])
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\networks\generator.py", line 392, in forward
out, _ = self.conv(input, style)
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\networks\generator.py", line 295, in forward
style = self.modulation(style).view(batch, 1, in_channel, 1, 1)
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\models\networks\generator.py", line 214, in forward
input, self.weight * self.scale, bias=self.bias * self.lr_mul
File "C:\Users\Admin\Documents\Github\Talking-Face_PC-AVS\app\venv\lib\site-packages\torch\nn\functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 dim 1 must match mat2 dim 0
misc/Input/517600055 1 misc/Pose_Source/517600078 160 misc/Audio_Source/681600002.mp3 misc/Mouth_Source/681600002 363 dummy
mat1 dim 1 must match mat2 dim 0
Process finished with exit code 0
The error occurs with these variables, although I'm not sure this is telling you much:
I'm currently running the code with PyTorch 1.8.1 (and Python 3.6) as I haven't managed to get PyTorch 1.3.0 working due to CUDA 10 not supporting my GPU. What would you recommend as a following action? Your help is very appreciated. Keep up the good work!