The vs-midas from holywu

Incompatibility with current pytorch?

using:

# Imports
import vapoursynth as vs
# getting Vapoursynth core
import sys
import ctypes
import os
import site
core = vs.core
# Import scripts folder
scriptPath = 'F:/Hybrid/64bit/vsscripts'
sys.path.insert(0, os.path.abspath(scriptPath))
# Adding torch dependencies to PATH
path = site.getsitepackages()[0]+'/torch_dependencies/bin/'
ctypes.windll.kernel32.SetDllDirectoryW(path)
path = path.replace('\\', '/')
os.environ["PATH"] = path + os.pathsep + os.environ["PATH"]
# Loading Plugins
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/SourceFilter/LSmashSource/vslsmashsource.dll")
# source: 'G:\TestClips&Co\files\test.avi'
# current color space: YUV420P8, bit depth: 8, resolution: 640x352, fps: 25, color matrix: 470bg, yuv luminance scale: limited, scanorder: progressive
# Loading G:\TestClips&Co\files\test.avi using LWLibavSource
clip = core.lsmas.LWLibavSource(source="G:/TestClips&Co/files/test.avi", format="YUV420P8", stream_index=0, cache=0, prefer_hw=0)
# Setting detected color matrix (470bg).
clip = core.std.SetFrameProps(clip, _Matrix=5)
# Setting color transfer info (470bg), when it is not set
clip = clip if not core.text.FrameProps(clip,'_Transfer') else core.std.SetFrameProps(clip, _Transfer=5)
# Setting color primaries info (), when it is not set
clip = clip if not core.text.FrameProps(clip,'_Primaries') else core.std.SetFrameProps(clip, _Primaries=5)
# Setting color range to TV (limited) range.
clip = core.std.SetFrameProp(clip=clip, prop="_ColorRange", intval=1)
# making sure frame rate is set to 25
clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)
clip = core.std.SetFrameProp(clip=clip, prop="_FieldBased", intval=0) # progressive
from vsmidas import midas as MiDaS
# adjusting color space from YUV420P8 to RGBH for MiDaS
clip = core.resize.Bicubic(clip=clip, format=vs.RGBH, matrix_in_s="470bg", range_s="limited")
# Depth estimation using MiDaS
clip = MiDaS(clip=clip, model=2, device_index=0)
# adjusting output color from: RGBH to YUV420P10 for x265Model
clip = core.resize.Bicubic(clip=clip, format=vs.YUV420P10, matrix_s="470bg", range_s="limited", dither_type="error_diffusion")
# set output frame rate to 25fps (progressive)
clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)
# Output
clip.set_output()

I get:

2023-12-14 16:30:42.755 Failed to evaluate the script:Python exception: Error(s) in loading state_dict for DPTDepthModel:Missing key(s) in state_dict: "pretrained.model.layers.3.downsample.reduction.weight", "pretrained.model.layers.3.downsample.norm.weight", "pretrained.model.layers.3.downsample.norm.bias", "pretrained.model.head.fc.weight", "pretrained.model.head.fc.bias". Unexpected key(s) in state_dict: "pretrained.model.layers.0.downsample.reduction.weight", "pretrained.model.layers.0.downsample.norm.weight", "pretrained.model.layers.0.downsample.norm.bias", "pretrained.model.layers.0.blocks.1.attn_mask", "pretrained.model.layers.1.blocks.1.attn_mask", "pretrained.model.head.weight", "pretrained.model.head.bias". size mismatch for pretrained.model.layers.1.downsample.reduction.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([384, 768]).size mismatch for pretrained.model.layers.1.downsample.norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).size mismatch for pretrained.model.layers.1.downsample.norm.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).size mismatch for pretrained.model.layers.2.downsample.reduction.weight: copying a param with shape torch.Size([1536, 3072]) from checkpoint, the shape in current model is torch.Size([768, 1536]).size mismatch for pretrained.model.layers.2.downsample.norm.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).size mismatch for pretrained.model.layers.2.downsample.norm.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).Traceback (most recent call last):File "src\cython\vapoursynth.pyx", line 3115, in vapoursynth._vpy_evaluateFile "src\cython\vapoursynth.pyx", line 3116, in vapoursynth._vpy_evaluateFile "C:\Users\Selur\Desktop\Testing.vpy", line 38, in clip = MiDaS(clip=clip, model=2, device_index=0)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_contextreturn func(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^File "F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\vsmidas\__init__.py", line 108, in midasmodule.load_state_dict(parameters)File "F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dictraise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(RuntimeError: Error(s) in loading state_dict for DPTDepthModel:Missing key(s) in state_dict: "pretrained.model.layers.3.downsample.reduction.weight", "pretrained.model.layers.3.downsample.norm.weight", "pretrained.model.layers.3.downsample.norm.bias", "pretrained.model.head.fc.weight", "pretrained.model.head.fc.bias". Unexpected key(s) in state_dict: "pretrained.model.layers.0.downsample.reduction.weight", "pretrained.model.layers.0.downsample.norm.weight", "pretrained.model.layers.0.downsample.norm.bias", "pretrained.model.layers.0.blocks.1.attn_mask", "pretrained.model.layers.1.blocks.1.attn_mask", "pretrained.model.head.weight", "pretrained.model.head.bias". size mismatch for pretrained.model.layers.1.downsample.reduction.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([384, 768]).size mismatch for pretrained.model.layers.1.downsample.norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).size mismatch for pretrained.model.layers.1.downsample.norm.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).size mismatch for pretrained.model.layers.2.downsample.reduction.weight: copying a param with shape torch.Size([1536, 3072]) from checkpoint, the shape in current model is torch.Size([768, 1536]).size mismatch for pretrained.model.layers.2.downsample.norm.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).size mismatch for pretrained.model.layers.2.downsample.norm.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
--

Cu Selur

Use cases?

This looks pretty interesting.
Would it be feasible to implement it into VapourSynth-capable players?
PotPlayer is pretty robust, even for 3D content but I've dreamed of the day we could use it with MiDaS models (particularly BEiT) for realtime conversion.

I'm assuming just getting the depth map for each frame wouldn't be enough; we'd also need something to apply the depth map pixel displacement to generate the left and right views.

I've get pretty decent results with DepthViewer, but there seem to be performance bottlenecks that prevent us from getting real-time processing on the better models.
Plus, I'd love to get 3D conversion working with motion interpolation like I we can do when decoding Blu-ray 3D natively.

holywu / vs-midas Goto Github PK

vs-midas's People

Contributors

Stargazers

Watchers

vs-midas's Issues

Incompatibility with current pytorch?

Use cases?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent