Giter VIP home page Giter VIP logo

slr's Introduction

SLR

isolated & continuous sign language recognition using CNN+LSTM/3D CNN/GCN/Encoder-Decoder

Requirements

Isolated Sign Language Recognition

CNN+LSTM

  1. four layers of Conv2d + one layer of LSTM

    Dataset Classes Samples Best Test Acc Best Test Loss
    CSL_Isolated 100 25,000 82.08% 0.734426
    CSL_Isolated 500 125,000 71.71% 1.332122
  2. ResNet + one layer of LSTM

    Dataset Classes Samples Best Test Acc Best Test Loss
    CSL_Isolated 100 25,000 93.54% 0.245582
    CSL_Isolated 500 125,000 83.17% 0.748759

3D CNN

  1. three layers of Conv3d

    Dataset Classes Samples Best Test Acc Best Test Loss
    CSL_Isolated 100 25,000 58.86% 1.560049
    CSL_Isolated 500 125,000 45.07% 2.255563
  2. 3D ResNet

    Method Dataset Classes Samples Best Test Acc Best Test Loss
    ResNet18 CSL_Isolated 100 25,000 93.30% 0.246169
    ResNet18 CSL_Isolated 500 125,000 79.42% 0.800490
    ResNet34 CSL_Isolated 100 25,000 94.78% 0.207592
    ResNet34 CSL_Isolated 500 125,000 81.61% 0.750424
    ResNet50 CSL_Isolated 100 25,000 94.36% 0.232631
    ResNet50 CSL_Isolated 500 125,000 83.15% 0.803212
    ResNet101 CSL_Isolated 100 25,000 95.26% 0.205430
    ResNet101 CSL_Isolated 500 125,000 83.18% 0.751727
  3. ResNet (2+1)D

    Dataset Classes Samples Best Test Acc Best Test Loss
    CSL_Isolated 100 25,000 98.68% 0.043099
    CSL_Isolated 500 125,000 94.85% 0.234880

GCN

Dataset Classes Samples Best Test Acc Best Test Loss
CSL_Skeleton 100 25,000 79.20% 0.737053
CSL_Skeleton 500 125,000 66.64% 1.165872

Skeleton+LSTM

Dataset Classes Samples Best Test Acc Best Test Loss
CSL_Skeleton 100 25,000 84.30% 0.488253
CSL_Skeleton 500 125,000 70.62% 1.078730

Continuous Sign Language Recognition

Encoder-Decoder

Encoder is ResNet18+LSTM, and Decoder is LSTM

Dataset Sentences Samples Best Test Wer Best Test Loss
CSL_Continuous 100 25,000 1.01% 0.034636
CSL_Continuous_Char 100 25,000 1.19% 0.049449

References

slr's People

Contributors

0aqz0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

slr's Issues

training continuous with ctc loss?

你好,

请问你有试着用ctc loss来训练 continuous SLR 吗?我的模型很简单,就是 pretrained EfficientNet + self-attention(3 layer) + linear + ctc loss. 我是在RWTH-PHOENIX- Weather-2014T 这个数据集上训练的,然后发现loss完全降不下去,维持在5.2左右,而且在解码的时候只预测blank。个人感觉应该不是我的ctc loss使用问题,因为我之前用ctc loss训练machine translation是没有问题的。想知道你有训练过这个数据集和用ctc来训这个模型吗?

acc和loss的一些问题

作者你好,我训练孤立词的时候发现在前几个epoch的时候acc就能到接近100,并且最后的loss值大概是0.0008左右,请问这是什么问题呢

连续句子测试问题

你好,我想测试我训练好的连续句子的模型,但是test.py文件好像是针对的是孤立词模型,请问我该怎样测试连续句子的模型呢,谢谢

孤立词手语数据集

image
在运行CSL_Isolated_Conv3D.py这个文件时,总是会报错IndexError: list index out of range,请问是为什么呢?

测试问题

你好,我想请问一下怎么做可视化检测界面啊,能实现实时手语识别

CSL Skeleton Dataset

请问dataset.py里的Skeleton部分中的selected_skeleton = torch.FloatTensor([selected_x, selected_y])语句会导致这个错误产生ValueError: expected sequence of length 38 at dim 1 (got 37),请问有什么解决办法吗

Cnnlstm

Hi,
It's me again. I tried to use your cnnlstm model which is Rescnnlstm, but the loss isn't going down. So I am wondering where I was wrong. I set the same parameter that you provided. How many epochs did you train for 92% accuracy? Thanks

有未处理过的视频数据集吗?

/home/haodong/Data/CSL_Isolated/color_video_125000'“这个视频时不是未处理过视频?下载链接里没有,up主和路过的朋友谁有下载网址呢?

训练问题

请问训练的时候需要对数据集进行什么处理吗

连续手语数据集

您好,请问CSL_Continuous和CSL_Continuous_Char有什么区别呢?我从官方的数据集中好像没有看到dictionary.txt这个文件

请问您是否对数据集做了预处理

您好,我在官网下载了数据集,下载完成后想用您的代码进行训练,但是对独立手语动作数据集读取这部分的路径和官方下载的数据集不太一样,想知道是不是您对数据集做了预处理,是把动作中不同的录入者放到了一个文件夹中吗?

About Graph neighbor_link

code
When use openpose, why used the link (11, 5), (8, 2)?
I only see the link (11, 1) and (8, 1) in openpose.

Also in "ntu-rgb+d", i can not find the link (22, 23), (24, 25), but you used it in the neighbor_link.

项目展示

请问最后需要通过AzureKinect摄像头展示效果吗

Issue on corpus.txt

Hello

I am currently working on a continuous seq2seq code.
I am trying to work on our dataset.
However, I cannot check dictionary.txt and corpus.txt file.
Can you upload files here or explain the composition of each file?

Thank you.

孤立词手语数据集

请问,您有500类别的骨架信息的文件吗?我在分享的百度网盘里下载的数据集,没有找到骨架信息的txt文件。

替换、添加、删除

在最后计算WER的值时,有没有办法知道substitution、insertion、deletion这三种单独的次数?

attention model

Excellent job!! Cause I am a fresh bird in this area, could you please tell me what is l and g which are as inputs for linearattentionblock?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.