Hello, author, I have reproduced your code now, but I want to use it to achieve a func

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Use the video input from the camera for action recognition about two-stream-pytorch HOT 7 CLOSED

bryanyzhu commented on May 24, 2024

Use the video input from the camera for action recognition

from two-stream-pytorch.

Comments (7)

bryanyzhu commented on May 24, 2024

You can use cv2.VideoCapture to read in the video (no matter read videos offline or read videos from camera), then get the frames, and finally do the prediction. Something like this,

cap = cv2.VideoCapture(VIDEO_NAME)
net = get_model(MODEL_NAME)
while(cap.isOpened()):
    ret, frame = cap.read()
    input = preprocess(frame)
    pred = net(input)
    if not ret: break

cap.release()

from two-stream-pytorch.

shijubushiju commented on May 24, 2024

@bryanyzhu Thank you very much. Let me have a try

from two-stream-pytorch.

shijubushiju commented on May 24, 2024

@bryanyzhu Hello, I have another question：
Lines 85 and 86 of the flow_vgg16.py file look like this:
rgb_weight_mean = torch.mean(rgb_weight, dim=1)
flow_weight = rgb_weight_mean.repeat(1,in_channels,1,1)
However, lines 179 and 182 of the file flow_resnet.py look like this:
rgb_weight_mean = torch.mean(rgb_weight, dim=1)
flow_weight = rgb_weight_mean.unsqueeze(1).repeat(1,in_channels,1,1)
How do I make sense of the difference

from two-stream-pytorch.

bryanyzhu commented on May 24, 2024

For current PyTorch version, I think the second one is more rigorous. But both of them should work because many operators support automatic broadcasting, so users don't need to worry about the dimension mismatch.

from two-stream-pytorch.

shijubushiju commented on May 24, 2024

@bryanyzhu Ok, I get an error when I run the first one, and then I run it perfectly with the second modification.Thank you for your reply. I will consult you if I have any questions

from two-stream-pytorch.

shijubushiju commented on May 24, 2024

@bryanyzhu
In VideoSpatialPrediction. py :
def VideoSpatialPrediction(
vid_name,
net,
num_categories,
start_frame=0,
num_frames=0,
num_samples=25
) :
Is num_samples the number of test videos in here？

The other problem is:
The model was tested using recorded video. What should I do if I want to test it online?

from two-stream-pytorch.

bryanyzhu commented on May 24, 2024

num_samples means number of frames sampled from one video. This is the standard evaluation setting used before, that is, we take 25 frames per video and do 10-crop per frame. So for each video, we actually perform 250 forward and average the predictions to get the final result.

For online videos, usually what people do (or the simplest way) is to wait for a few frames, do the prediction and perform average. Then doing the same thing in a sliding window fashion.

from two-stream-pytorch.

Use the video input from the camera for action recognition about two-stream-pytorch HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent