ahmetgunduz / real-time-gesrec Goto Github PK

Real-time Hand Gesture Recognition with PyTorch on EgoGesture, NvGesture, Jester, Kinetics and UCF101

Home Page: https://arxiv.org/abs/1901.10323

License: MIT License

Python 92.30% Shell 7.70%

gesture-recognition cnn pytorch video-processing machine-learning deep-neural-networks hand-gesture-recognition resnet jester egogesture

real-time-gesrec's Introduction

Hi there 👋 I'm Ahmet

🔭 I’m currently working @aixplain as a Data Scientist
🎓 I have education background from Bogazici, TUM, and LMU
📜 I have two Master's degrees in Data Science and Telecommunications Engineering
⚡ Fun fact: I do not like to code, but I love to solve problems with code

real-time-gesrec's People

Contributors

Stargazers

Watchers

Forkers

ml-lab silky ideaplexus staceycy afsalem shenmayufei andrster1 asdfqwer2015 y-x-c brandonzhong sanchitmishra salt-fly jzlhit luozhipi joelgschwind rtb7syl bityangke abhinavpatel2912 arunsha okankop gibranbenitez pandinosaurus peterzhousz parkjh688 karthik-bhaskar muhamed-farooq buryang tranvansanghust scilover thanhitpro lilyswang miaochenguo okanoshogo0903 0x6d7a leeyongchao 9dsmart jcmayoral henryleou jiojio1973 zpicenow anjingxing coldpressedlinkjuice osamasarhan liu824 nsl2014fm mbencherif peterzs hadimh ycheng22 luxuff ragnardanneskjold fweidner libkup terasakisatoshi wittamer123 nabilbukhari felix-wt wxjames abhinav-2912 nush327 nush27 villawang qinhao117 tudoranca niuwenju arrowa70 leylakhaleghi xinhu98 wienerjier teo-milea jjdbear wxthon deftruth soumi7 binianzjl nyashacodes weimeng97 aqsc wrn7777 badri2304 ishmaelsnapt jahsylla menglongyue aidonchuk johndpope ktqiu sinianyutian xiaojinu yian454 zoq stone89son bearing-413 swipswaps solvve 0aqz0 soorajsknair93 noobana actasidiot yongxie-icmm m9613163

real-time-gesrec's Issues

Queries about training

Hello,

I am trying to implement your paper from scratch as part of my project and have some questions which I was hoping you could answer. I am only trying to train the detector half of the network for now and using the JESTER dataset to do so.

How is the data getting fed in? Each folder has 'n' frames which belong to a category(i.e. gesture or no gesture). Taking the detector queue as 8 frames, do you then split the 'n' frames into n/8 chunks each having the gesture or no gesture label?
How long did you pre-train it on JESTER for? You paper mentions 25 epochs but I am guessing that'd for the classifier? Your code seems to indicate 100 epochs instead.

I am hoping you can help me with this. Thanks in advance!

Regards,
Nishant Bhattacharya

Test the model with RGB_D camera

Hi, thanks for sharing your amazing model.

I am now trying to test the model with my RGB-D camera. However, I am a beginner in pytorch. So, I need some help to go through the code:

I plan to feed the model with depth images, which is achieved form the camera with openni and opencv. The shape of each frame is (112,112,3). If I want to detect and classify n frames in each iteration, what shape should the input be.
What does "sample_duration" mean? What is the difference between "sample_duration_det" and detector queue.

I am using egogesture depth model.

video path isn't working

underfitting for nvGesture?

Hi, I tried to train the classifier for nvgesture from scratch(the hyperparams come from run_online.sh, batchsize=8, resnext101, cls=25, lr=0.01, duration=32). But I found it's almost not fitting, after tens of epoch, the acc for trainset and valset is about 5%.
And I tried to increase batchsize to 16, the acc can however converged to 9%, it's still very low. I also tried to set norm_value=255 to normalize the input data to small range or smaller lr, but it didn't work.
Did I miss something?
BTW, the detector trained from scratch is well with acc about 80%.

Training on the jester dataset - questions

Hi,

I have some question regarding the training process:

Do you trained separability the two models? detection & classification
Can I use the jester dataset to train the two models? (something that confuses me is that one of the classes in the jester dataset is 'no gesture'?
As I understand you used the same code in the main.py to train separability the two models?
If so, what do I need to change to switch between the training of the two models? excepts the network parameters?

Thank you,
Olga

How to get the egogesture RGB pretrained model?

The google drive files contain only the egogesture_Depth.pth, how to get the RGB model?

RGB-D result

Hi,

from the paper, I only saw RGB or Depth results, how about RGB-D? Are you able to release the pre-trained RGB-D models?

Thanks.

accuracy

Hi Ahmet,

I trained the classifier using Egogesture dataset. But the validation accuracy is just around 50% and also training accuracy is around 60%.

I am using ResNext101 architecture

Am I missing anything?

RGB pre-trained models

Are the models in the drive also usable for RGB prediction/classification?

If not, could I kindly ask you to upload these models as well?

I am asking since the names would suggest that every model (except the jester) is for Depth data.

Thank you very much

trainning Detetor and Classifier Model

Hi ,
I want to know how to train the Detector model and classifier model ,can you show me the scripts parameters of setting ? Thank you very much !

Reshaping error "shape '[32, -1, 112, 112]' is invalid for input of size 865536"

I am trying to do both detection and clasification for jster dataset and I have trained the detector part of it and saved it's checkpoint and for the classification part I am using the checkpoint that you have provided. So for inferencing pupose I am running run_online.sh file by putting both checkpoints there. So for that I have made Jester_online.py file just like you made egogesture_online.py to get the dataset to provide for evaluation but it is showing some reshape error in the following line -
clip = torch.cat(clip, 0).view((self.sample_duration, -1) + im_dim).permute(1, 0, 2, 3)
the error is -
RuntimeError: shape '[32, -1, 112, 112]' is invalid for input of size 865536

For simple classification, I didn't get this error. I don't know where did this number 865536 come from.
Can you help me out with this problem. I am attaching the screenshot here.

Model accuracy with Jester dataset is poor

Hi I have tried to validate the pretrained model with Jester dataset.

Preconditions:

Retrained model used jester_resnext_101_RGB_32.pth
Dataset Jester
Configurations opts.zip
Source modifications diff.zip
PyTorch version 1.1.0
Python version 3.7.3

Test:

python utils/jester_json.py 'annotation_Jester' to prepare the dataset
python offline_test.py to start the execution

But the output precision is very poor

[11/3721] Time 1.07421 (1.13381) prec@1 0.03409 prec@5 0.20455 precision 0.00000 (0.03213) recall 0.00000 (0.01278)
[12/3721] Time 1.09013 (1.13017) prec@1 0.03646 prec@5 0.20312 precision 0.03030 (0.03198) recall 0.03030 (0.01424)
[13/3721] Time 1.07996 (1.12631) prec@1 0.03365 prec@5 0.20192 precision 0.00000 (0.02952) recall 0.00000 (0.01315)
[14/3721] Time 1.08615 (1.12344) prec@1 0.03125 prec@5 0.20089 precision 0.00000 (0.02741) recall 0.00000 (0.01221)

Could you please help me to find what am missing to get the proper output?
Regards,
Albin

How to do real-time testing with a camera?

How the pre-trained model on Jester could be used to train EgoGesture?

When I tried to use the pre-trained classification model on Jester to train EgoGesture dataset, it showed that

RuntimeError: Error(s) in loading state_dict for DataParallel:
size mismatch for module.fc.weight: copying a param with shape torch.Size([27, 2048]) from checkpoint, the shape in current model is torch.Size([83, 2048]).
size mismatch for module.fc.bias: copying a param with shape torch.Size([27]) from checkpoint, the shape in current model is torch.Size([83]).

It seems it is because Jester and EgoGesture have different classes of gestures. So how should I change this parameter?

My code is shown like this:
#!/bin/bash
python main.py
--root_path ~/
--video_path /home/wisccitl/Desktop/EgoGesture
--annotation_path Real-time-GesRec/annotation_EgoGesture/egogestureall_but_None.json
--result_path Real-time-GesRec/results
--resume_path Real-time-GesRec/models/jester_resnext_101_RGB_32.pth
--dataset egogesture
--sample_duration 32
--learning_rate 0.01
--model resnext
--model_depth 101
--resnet_shortcut B
--batch_size 64
--n_classes 83
--n_finetune_classes 83
--n_threads 16
--checkpoint 1
--modality RGB
--train_crop random
--n_val_samples 1
--test_subset test
--n_epochs 100 \

n_frames

Hi,

I am trying to use your AMAZING code with the jester dataset.
I have some questions:

I see that you assume that for each video there is a directory names "n_frames".
How can I create those directories?
I downloaded the jester dataset and extracted the data as describes in the link but there aren't directories with the name "n_frames"
Can you give the parameters for testing on the jester dataset?

Thank you!

opts

Hey Ahmet,

I am trying to replicate your work. I am having problems in dataloader (probably). I haven't changed anything in your code except for the paths and few minor changes for which I was getting errors.

My opt looks like:

annotation_path='/home/ndhingra/Real-time-GesRec/Real-time-GesRec/annotation_EgoGesture/egogestureall.json', arch='resnet-10', batch_size=128, begin_epoch=1, checkpoint=10, crop_position_in_test='c', dampening=0.9, dataset='egogesture', ft_begin_index=0, initial_scale=1.0, learning_rate=0.1, lr_patience=10, lr_steps=[10, 25, 50, 80, 100], manual_seed=1, mean=[114.7748, 107.7354, 99.475], mean_dataset='activitynet', modality='RGB', model='resnet', model_depth=10, momentum=0.9, n_classes=400, n_epochs=200, n_finetune_classes=400, n_scales=5, n_threads=4, n_val_samples=3, nesterov=False, no_cuda=False, no_hflip=False, no_mean_norm=False, no_softmax_in_test=False, no_train=False, no_val=False, norm_value=1, optimizer='sgd', pretrain_path='', resnet_shortcut='B', resnext_cardinality=32, result_path='/home/ndhingra/Real-time-GesRec/Real-time-GesRec/results', resume_path='', root_path='/home/ndhingra/Real-time-GesRec/Real-time-GesRec', root_video_path='/media/storage/ndhingra/EgoGesture', sample_duration=16, sample_size=112, scale_in_test=1.0, scale_step=0.84089641525, scales=[1.0, 0.84089641525, 0.7071067811803005, 0.5946035574934808, 0.4999999999911653], std=[38.7568578, 37.88248729, 40.02898126], std_norm=False, store_name='model', test=False, test_subset='val', train_crop='corner', train_validate=False, video_path='/home/ndhingra/Real-time-GesRec/Real-time-GesRec/images', weight_decay=0.001, weighted=False, wide_resnet_k=2)

I get error in main.py

train_loader = torch.utils.data.DataLoader(
training_data,
batch_size=opt.batch_size,
shuffle=True,
num_workers=opt.n_threads,
pin_memory=True)

i.e.,

ValueError: num_samples should be a positive integeral value, but got num_samples=0

Can you suggest what changes do I have to make? If possible can you also upload opts.py which you used for egogesture dataset. Since I haven't made any changes to your code, I expect it to work as it worked for you.

size mismatch for jester pre-trained model

Hi,

I am trying to apply offline_test.py on the jester dataset with your pre-trained model.
and I got:
"size mismatch for module.conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 3, 3, 7, 7])."

I think that maybe I have some problem with my parameters.
Can you please help me?

Thank you again!
I appreciate all you help!

Variables in online_test.py

Hi again,

Can you please give more details of some of the variables in online_test script?
I think that I understand them, but after looking at the code, I am not sure...

passive_count => # consecutive number of 'no gesture' classifier
active
active_index
pre_predict
finished_prediction
prev_active

There is a situation where finished_prediction = false but we got to the last window frame?
so, results is empty (predicted = np.array(results)[:, 1]) and the code is fails...
I am wondering how to deal with that? how to calculate levenshtein_distance in this case?

Thank you again!!!!

RuntimeError: input and weight type

I am tyring train a detector on Jester dataset. However, when I run run_offline.sh I encounter the followring error right after the dataset is loaded:

Traceback (most recent call last):
File "main.py", line 177, in
train_logger, train_batch_logger)
File "/home/khasmamad/Desktop/kimo/Real-time-GesRec/train.py", line 34, in train_epoch
outputs = model(inputs)
File "/home/khasmamad/miniconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/khasmamad/miniconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/khasmamad/miniconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/khasmamad/Desktop/kimo/Real-time-GesRec/models/resnetl.py", line 177, in forward
x = self.conv1(x)
File "/home/khasmamad/miniconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/khasmamad/miniconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 448, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

Googling showed me that this happens when the input and model are on separate devices (in this case, input is in GPU, while model is in CPU). But I still cannot figure out a solution. Please, help.

NvGesture dataset

There are 31 files named:

                      nvGesture_v1.7z.001
                     to nvGesture_v1.7z.031

I am looking to extract these files to video format. Since these files are zipped in .7z format. I tried using

                      cat nvGesture_v1.7z.0?? | 7za x

                      cat nvGesture_v1.7z.0?? | 7za e

but in both cases I get error:

                        Error:
                        Incorrect command line

Get stuck at running online_test.py with pretrained model on CPU

I was trying to run oneline_test.py with pretrained model on CPU (w/o CUDA). I did some modification in model.py and online_test.py, including:

added opt.no_cuda = True right after opt = parse_opts_online()
added map_location=torch.device('cpu') to torch.load(opt.pretrain_path)
modified model.load_state_dict(pretrain['state_dict'])at line 120ish to

            state_dict =pretrain['state_dict']
            from collections import OrderedDict
            new_state_dict = OrderedDict()
            for k, v in state_dict.items():
                if 'module' in k:
                    name = k[7:] # remove 'module.' of dataparallel
                new_state_dict[name]=v
            model.load_state_dict(new_state_dict)

Solved the issue of

Missing key(s) in state_dict: "conv1.weight",...
Unexpected key(s) in state_dict: "module.conv1.weight", ...

solved by looking into here

However, now it gives me an error:

Traceback (most recent call last):
  File "online_test.py", line 138, in <module>
    detector,classifier = load_models(opt)
  File "online_test.py", line 76, in load_models
    detector, parameters = generate_model(opt)
  File "...../Real-time-GesRec-master/model.py", line 132, in generate_model
    model.load_state_dict(new_state_dict)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResNetL:
        size mismatch for conv1.weight: copying a param with shape torch.Size([16, 1, 7, 7, 7]) from checkpoint, the shape in current model is torch.Size([16, 3, 7, 7, 7]).

Thanks in advance!!!

Regarding running on webcam

How to run it on webcam?
How to write different DataLoader for including the images from webcam?

Where should we put the traind model?

Miss alarm when same gesture twice in online mode?

Hi, ahmetgunduz:
I tested the model in online mode with my own video. And all looks fine except if I show the same gesture twice. The model failed to predict the second gesture(no gesture is detected). And I doubt the reason maybe the rule based filter, but I'm not sure. Could you please give me some advice?

Any online test example code for jester data?

Start frame and end frame missing from trainlist01.txt and vallist01.txt.

In README.md it says that N frames format is as following: "path to the folder" "class index" "start frame" "end frame".
However that information seems to be missing from annotation_Jester/trainlist01.txt and annotation_Jester/vallist01.txt. Is it somewhere else? Am I looking at the right files?

Thanks.

cuda gpu device Error

Hi.

I have 1 GPU in my computer but I got this error.
I'm newbie of Pytorch so I don't know this Error's meaning.

Traceback (most recent call last):
  File "main.py", line 177, in <module>
    train_logger, train_batch_logger)
  File "/home/eden/Real-time-GesRec/train.py", line 34, in train_epoch
    outputs = model(inputs)
  File "/home/eden/anaconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eden/anaconda3/envs/gesrec/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 146, in forward
    "them on device: {}".format(self.src_device_obj, t.device))
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

Unable to extract Jester datasets

Hi,
I am getting an error while extracting the Jester dataset files except 20bn-jester-v1-00 data file, while other files are giving the error. And when I checked the type of the data files, So what I observed it the file type of 20bn-jester-v1-00 is different from the other files. I am attching the screenshot of the error I am gettting which also includes the file types, please help if you have also resolved the same issue.

egogesture_online.py--IndexError

when I ran "online_test.py", the error-"IndexError index -3 is out of bounds for axis 0 with size 0" happened in the line 152 of egogesture_online.py
(counts = np.bincount(label_list[np.array(list(range(_ - int(sample_duration/8), _ )))])).
I do not know how to resolve it.

how upzip nvGesture dataset, i use 7za e datafilename

Processing archive: /home/lxj/Gesture_recognition/data/nvGesture/nvGesture_v1.7z.001

Error: E_FAIL

Prediction Accuracy on Nvidia Gesture dataset is very poor.

Hi Ahmet Gündüz,

I was using you pre-trained model (nv_resnext_101_Depth_32.pth) to test on nvidia gesture dataset. My accuracy for this dataset is very poor (not even 20%). Can you explain whether the model is correct one to test and if yes than why the prediction accuracy is so poor.
I have followed the steps mentioned by you in your github post.

Not able to run offline_test.py for Jester dataset

Thank you so much for the great solution.

I am in the processing validating the solution and understanding more. Tried to test the pretrained model jester_resnext_101_RGB_32.pth with Jester dataset

Downloaded dataset and performed frame creation with python utils/jester_json.py 'annotation_Jester'.

But the command Python offline_test.py is giving below error:
dataset loading [14780/14787] run Traceback (most recent call last): File "offline_test.py", line 161, in <module> outputs = model(inputs) File "/home/albin/anaconda3/envs/l3c_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/albin/anaconda3/envs/l3c_env/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 146, in forward "them on device: {}".format(self.src_device_obj, t.device)) RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

Please someone help me to resolve this issue

Regards,
Albin

I modify run_offline.sh to fit jester dataset, but got precision0.03 recall0.03

Hi,
Here's my modified run_offline.sh. If I make some error params, please help to correct. Thanks!

python offline_test.py
--root_path ~/
--video_path /home/ps/NewDisk1/Public_open/Jester/20bn-jester-v1
--annotation_path ~/Codes/Real-time-GesRec/annotation_Jester/jester.json
--result_path ~/Codes/Real-time-GesRec/results
--pretrain_path Codes/Real-time-GesRec/pretrained_models/jester_resnext_101_RGB_32.pth
--dataset jester
--sample_duration 32
--learning_rate 0.01
--model resnext
--model_depth 101
--batch_size 1
--n_classes 27
--n_finetune_classes 27
--modality RGB
--n_threads 8
--checkpoint 1
--train_crop random
--n_val_samples 1
--test_subset val
--n_epochs 100

[14787/14787] Time 0.04845 (0.09407) prec@1 0.03293 prec@5 0.18780 precision 0.00000 (0.03293) recall 0.00000 (0.03293)
-----Evaluation is finished------
Overall Prec@1 0.03293% Prec@5 0.18780%

jester pretrained model

Hello, you did a great work good job !
My question is about the jester resnext pretrained weights:

I want to know given for exampe a single folder with 35 RGB frames what are the transformations that I need to make to this frames to get the right input and then make prediction ?
And how I can load the resnext model properly ?
And is the pretrained weights compatible with pytorch 1.0.1 cuda 10

No correct results printed

Hi Ahmet,
I ran online_test.py on egogesture with cpu only by setting "opt.no_cuda=True" and "opt.n_threads =0",
but it didn't produce the right results,I changed the code like this:

and one of the results printed by console like this:

it seems that the switch never been activated，causing no result been appended:

I'd appreciate it if you could help me with this

How to feed input to classifier in online_test.py using tensor.float

Hi,
How to feed input to classifier in online_test.py using tensor.float .
I tried ,

frame= np.reshape(frame,(1,1,1,512,512)) frame=cv2.normalize(frame,None,alpha=0,beta=1,norm_type=cv2.NORM_MINMAX,dtype=cv2.CV_32F)
input_clf = torch.from_numpy(frame).float()
outputs_det = classifier(inputs_clf)

I get the following error,

RuntimeError: invalid argument 2: input image (T: 1 H: 32 W: 16) smaller than kernel size (kT: 2 kH: 3 kW: 3) at /pytorch/aten/src/THCUNN/generic/VolumetricAveragePooling.cu:57

Originally posted by @sathiez in #17 (comment)

EgoGesture Dataset is not available

Hi,

Currently I am learning your code and trying to get your result.

However, I could not download the EgoGesture dataset(author's email is wrong).

Could you please provide another link to download the dataset or any other help.

Thank you!

ResNet Detector Model

Hi Ahmet,

Could you quickly send the .PTH file for the ResNet10 Detector model you talk about in your paper? This would help with replicating what you did in your paper a lot! Thanks!

Explanation of Testing

Hey Ahmet,
thanks for this amazing work.

I was going to test your pretrained model but there are a lot of disambiguations! can you please explain how, or give an instruction on running and testing your model?

AssertionError

I'm new to pytorch, and I ran into a lot of errors while debugging the program. Most of them have been resolved, but I'm stuck with an AssertionError

caffe-repository

Hi, congratulations on your wonderful job, but I wonder if you have any plans to release a caffe repository?
Thanks!

pre-trained model

Hi, would you like to share your pre-trained model that can be finetuned for both detection and classification?
Thanks.

Running realtime classifiaction on an RGB camera

Can you please provide steps to run the real time classification (online_test.py) on a RGB camera such as a normal laptop webcam.

any readme?

Hi:
Thanks for your shared code.
But could you please write a readme?

How do I train detector part only for any dataset?

There is no detector checkpoint is added to the run_offline.py file while calling main.py.
Is there any other way to do so?

Could you please give me some advice for more class finetune from jester?

Hi, ahmetgunduz:
I tried to finetune a model(classifier) trained on jester to my own gesture datasets, but the performance is awful. Could you please give me some advice? And Could you please share some details for your ego finetuning experiment details?(lr? freeze some layers? freeze bn?)
My own gesture dataset has 88 classes, almost the union of jester gesture class and ego gesture class. And only a few samples per class, (train: 42 samples, val: 6 samples). The dataset is small and with distortion(wide angle camera), second person perspective, and similar with jester.

In my experiment, the arch of trained model is resnet34_0.5channels (4 block layers: [3, 4, 6, 3]). Here are results:
a. only train fc layer, all conv and bn are frozen, lr 0.001, dropout 0.7, performance: train 0.573, val: 0.463
b. block layer4 and fc, conv1 and layer1~3(include bn) are frozen , lr 0.01, dropout 0.7, performance: train 0.743, val: 0.447
c. entire model, large lr 0.01 due to my bug, no dropout, but the model got best performance: train0.909, val0.582
d. train from scratch, lr 0.01, dropout 0.7, performance: train 0.712, val 0.329

I also found the pretrained model from jester may predict some swiping case in my own dataset as sliding due to distortion.

It seems the model has not benefit much from finetuning due to more classes than jester and distortion. Could you give me some advice?

Thanks.

Torch size error with Jester Dataset

Hi.
I got this error message with Jester Dataset.

I don't know the second number's meaning of torch.Size([64, 3, 7, 7, 7]) or torch.Size([64, 1, 7, 7, 7]).

and this is my run_offline.sh file.

#!/bin/bash
python main.py \
        --root_path ~/ \
        --video_path /home/eden/20BN-jester/20bn-jester-v1/videos \
        --annotation_path ~/Real-time-GesRec/annotation_Jester/jester.json\
        --result_path ~/Real-time-GesRec/results \
        --resume_path ~/Real-time-GesRec/jester_resnext_101_RGB_32.pth \
        --dataset jester \
        --sample_duration 8 \
    --learning_rate 0.01 \
    --model resnext \
        --model_depth 101 \
        --resnet_shortcut A \
        --batch_size 16 \
        --n_classes 27 \
        --n_finetune_classes 27 \
        --n_threads 16 \
        --checkpoint 1 \
        --modality Depth \
        --train_crop random \
        --n_val_samples 3 \
        --test_subset test \
     --n_epochs 100 \

nvGesture training

Hello Ahmet, I am trying to run your code on Nvidia dataset. On running the main.py, the train.log looks like this.
epoch loss acc precision recall lr

1 0 0 0 0 0.1

2 0 0 0 0 0.1

3 0 0 0 0 0.1

which I don't think is right. Can you please tell me what am I doing wrong?
Other than setting path and reducing the epoch value from 100 to 50. I haven't changed anything.
GesRec.pdf
These are the parameters and a part of train.log after running the main.py.

Testing on real-time RGB video using jester Pretrained model

I am trying to understand your code. I have understood that for other datasets there is two models detector and classifier. I only have jester dataset available with me. And for that we only one model.

Can you please tell me How can we do real time detection RGB camera video without detetctor?
And can we recognize gesture with jester model?
or which model of other datasets can be used for same?

run online_test.py

Hi Mr Ahmet,
thanks for sharing your perfect project.
I was going to test your pretrained model in online mode but I confront an error when loading the model.
please help me .
Namespace(annotation_path='/home/sattarian/Documents/projects/hand-guesture/annotation_EgoGesture/egogestureall.json', arch='resnetl-10', batch_size=1, begin_epoch=1, checkpoint=1, clf_queue_size=16, clf_strategy='median', clf_threshold_final=0.15, clf_threshold_pre=0.6, crop_position_in_test='c', dampening=0.9, dataset='egogesture', det_counter=2.0, det_queue_size=4, det_strategy='median', ft_begin_index=0, initial_scale=1.0, learning_rate=0.1, lr_patience=10, lr_steps=[10, 20, 30, 40, 100], manual_seed=1, mean=[114.7748, 107.7354, 99.475], mean_dataset='activitynet', modality='Depth', modality_clf='Depth', modality_det='Depth', model='resnetl', model_clf='resnext', model_depth=10, model_depth_clf=101, model_depth_det=10, model_det='resnetl', momentum=0.9, n_classes=2, n_classes_clf=83, n_classes_det=2, n_epochs=200, n_finetune_classes=2, n_finetune_classes_clf=83, n_finetune_classes_det=2, n_scales=5, n_threads=16, n_val_samples=1, nesterov=False, no_cuda=False, no_hflip=False, no_mean_norm=False, no_softmax_in_test=False, no_train=False, no_val=False, norm_value=1, optimizer='sgd', pretrain_path='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnetl_10_Depth_8.pth', pretrain_path_clf='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnext_101_Depth_32.pth', pretrain_path_det='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnetl_10_Depth_8.pth', resnet_shortcut='A', resnet_shortcut_clf='B', resnet_shortcut_det='A', resnext_cardinality=32, resnext_cardinality_clf=32, resnext_cardinality_det=32, result_path='/home/sattarian/Documents/projects/hand-guesture/results', resume_path='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnetl_10_Depth_8.pth', resume_path_clf='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnext_101_Depth_32.pth', resume_path_det='/home/sattarian/Documents/projects/hand-guesture/egogesture_resnetl_10_Depth_8.pth', root_path='/home/sattarian/Documents/projects/hand-guesture/', sample_duration=8, sample_duration_clf=32, sample_duration_det=8, sample_size=112, scale_in_test=1.0, scale_step=0.84089641525, scales=[1.0, 0.84089641525, 0.7071067811803005, 0.5946035574934808, 0.4999999999911653], std=[38.7568578, 37.88248729, 40.02898126], std_norm=False, store_name='model', stride_len=1, test=True, test_subset='test', train_crop='random', video_path='/home/sattarian/Documents/projects/hand-guesture/video_kinetics_jpg', weight_decay=0.001, whole_path='video_kinetics_jpg', wide_resnet_k=2, wide_resnet_k_clf=2, wide_resnet_k_det=2)
loading pretrained model /home/sattarian/Documents/projects/hand-guesture/egogesture_resnetl_10_Depth_8.pth
Traceback (most recent call last):
File "online_test.py", line 137, in
detector,classifier = load_models(opt)
File "online_test.py", line 75, in load_models
detector, parameters = generate_model(opt)
File "/home/sattarian/Documents/projects/hand-guesture/model.py", line 68, in generate_model
model.load_state_dict(pretrain['state_dict'])
File "/home/sattarian/anaconda3/envs/deep-learning/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
size mismatch for module.conv1.weight: copying a param with shape torch.Size([16, 1, 7, 7, 7]) from checkpoint, the shape in current model is torch.Size([16, 3, 7, 7, 7]).

ahmetgunduz / real-time-gesrec Goto Github PK

real-time-gesrec's Introduction

Hi there 👋 I'm Ahmet

real-time-gesrec's People

Contributors

Stargazers

Watchers

Forkers

real-time-gesrec's Issues

Recommend Projects

Recommend Topics

Recommend Org