Giter VIP home page Giter VIP logo

vs-midas's People

Contributors

holywu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

vs-midas's Issues

Incompatibility with current pytorch?

using:

# Imports
import vapoursynth as vs
# getting Vapoursynth core
import sys
import ctypes
import os
import site
core = vs.core
# Import scripts folder
scriptPath = 'F:/Hybrid/64bit/vsscripts'
sys.path.insert(0, os.path.abspath(scriptPath))
# Adding torch dependencies to PATH
path = site.getsitepackages()[0]+'/torch_dependencies/bin/'
ctypes.windll.kernel32.SetDllDirectoryW(path)
path = path.replace('\\', '/')
os.environ["PATH"] = path + os.pathsep + os.environ["PATH"]
# Loading Plugins
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/SourceFilter/LSmashSource/vslsmashsource.dll")
# source: 'G:\TestClips&Co\files\test.avi'
# current color space: YUV420P8, bit depth: 8, resolution: 640x352, fps: 25, color matrix: 470bg, yuv luminance scale: limited, scanorder: progressive
# Loading G:\TestClips&Co\files\test.avi using LWLibavSource
clip = core.lsmas.LWLibavSource(source="G:/TestClips&Co/files/test.avi", format="YUV420P8", stream_index=0, cache=0, prefer_hw=0)
# Setting detected color matrix (470bg).
clip = core.std.SetFrameProps(clip, _Matrix=5)
# Setting color transfer info (470bg), when it is not set
clip = clip if not core.text.FrameProps(clip,'_Transfer') else core.std.SetFrameProps(clip, _Transfer=5)
# Setting color primaries info (), when it is not set
clip = clip if not core.text.FrameProps(clip,'_Primaries') else core.std.SetFrameProps(clip, _Primaries=5)
# Setting color range to TV (limited) range.
clip = core.std.SetFrameProp(clip=clip, prop="_ColorRange", intval=1)
# making sure frame rate is set to 25
clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)
clip = core.std.SetFrameProp(clip=clip, prop="_FieldBased", intval=0) # progressive
from vsmidas import midas as MiDaS
# adjusting color space from YUV420P8 to RGBH for MiDaS
clip = core.resize.Bicubic(clip=clip, format=vs.RGBH, matrix_in_s="470bg", range_s="limited")
# Depth estimation using MiDaS
clip = MiDaS(clip=clip, model=2, device_index=0)
# adjusting output color from: RGBH to YUV420P10 for x265Model
clip = core.resize.Bicubic(clip=clip, format=vs.YUV420P10, matrix_s="470bg", range_s="limited", dither_type="error_diffusion")
# set output frame rate to 25fps (progressive)
clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)
# Output
clip.set_output()

I get:

2023-12-14 16:30:42.755 Failed to evaluate the script:Python exception: Error(s) in loading state_dict for DPTDepthModel:Missing key(s) in state_dict: "pretrained.model.layers.3.downsample.reduction.weight", "pretrained.model.layers.3.downsample.norm.weight", "pretrained.model.layers.3.downsample.norm.bias", "pretrained.model.head.fc.weight", "pretrained.model.head.fc.bias". Unexpected key(s) in state_dict: "pretrained.model.layers.0.downsample.reduction.weight", "pretrained.model.layers.0.downsample.norm.weight", "pretrained.model.layers.0.downsample.norm.bias", "pretrained.model.layers.0.blocks.1.attn_mask", "pretrained.model.layers.1.blocks.1.attn_mask", "pretrained.model.head.weight", "pretrained.model.head.bias". size mismatch for pretrained.model.layers.1.downsample.reduction.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([384, 768]).size mismatch for pretrained.model.layers.1.downsample.norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).size mismatch for pretrained.model.layers.1.downsample.norm.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).size mismatch for pretrained.model.layers.2.downsample.reduction.weight: copying a param with shape torch.Size([1536, 3072]) from checkpoint, the shape in current model is torch.Size([768, 1536]).size mismatch for pretrained.model.layers.2.downsample.norm.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).size mismatch for pretrained.model.layers.2.downsample.norm.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).Traceback (most recent call last):File "src\cython\vapoursynth.pyx", line 3115, in vapoursynth._vpy_evaluateFile "src\cython\vapoursynth.pyx", line 3116, in vapoursynth._vpy_evaluateFile "C:\Users\Selur\Desktop\Testing.vpy", line 38, in clip = MiDaS(clip=clip, model=2, device_index=0)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_contextreturn func(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^File "F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\vsmidas\__init__.py", line 108, in midasmodule.load_state_dict(parameters)File "F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dictraise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(RuntimeError: Error(s) in loading state_dict for DPTDepthModel:Missing key(s) in state_dict: "pretrained.model.layers.3.downsample.reduction.weight", "pretrained.model.layers.3.downsample.norm.weight", "pretrained.model.layers.3.downsample.norm.bias", "pretrained.model.head.fc.weight", "pretrained.model.head.fc.bias". Unexpected key(s) in state_dict: "pretrained.model.layers.0.downsample.reduction.weight", "pretrained.model.layers.0.downsample.norm.weight", "pretrained.model.layers.0.downsample.norm.bias", "pretrained.model.layers.0.blocks.1.attn_mask", "pretrained.model.layers.1.blocks.1.attn_mask", "pretrained.model.head.weight", "pretrained.model.head.bias". size mismatch for pretrained.model.layers.1.downsample.reduction.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([384, 768]).size mismatch for pretrained.model.layers.1.downsample.norm.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).size mismatch for pretrained.model.layers.1.downsample.norm.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([384]).size mismatch for pretrained.model.layers.2.downsample.reduction.weight: copying a param with shape torch.Size([1536, 3072]) from checkpoint, the shape in current model is torch.Size([768, 1536]).size mismatch for pretrained.model.layers.2.downsample.norm.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).size mismatch for pretrained.model.layers.2.downsample.norm.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
--

see also: https://forum.doom9.org/showthread.php?t=185196

Cu Selur

Use cases?

This looks pretty interesting.
Would it be feasible to implement it into VapourSynth-capable players?
PotPlayer is pretty robust, even for 3D content but I've dreamed of the day we could use it with MiDaS models (particularly BEiT) for realtime conversion.
image
I'm assuming just getting the depth map for each frame wouldn't be enough; we'd also need something to apply the depth map pixel displacement to generate the left and right views.

I've get pretty decent results with DepthViewer, but there seem to be performance bottlenecks that prevent us from getting real-time processing on the better models.
Plus, I'd love to get 3D conversion working with motion interpolation like I we can do when decoding Blu-ray 3D natively.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.