Giter VIP home page Giter VIP logo

slovo's Introduction

Slovo - Russian Sign Language Dataset

We introduce a large-scale video dataset Slovo for Russian Sign Language task. Slovo dataset size is about 16 GB, and it contains 20400 RGB videos for 1000 sign language gestures from 194 singers. Each class has 20 samples. The dataset is divided into training set and test set by subject user_id. The training set includes 15300 videos, and the test set includes 5100 videos. The total video recording time is ~9.2 hours. About 35% of the videos are recorded in HD format, and 65% of the videos are in FullHD resolution. The average video length with gesture is 50 frames.

For more information see our paper - arXiv.

Downloads

Main download link

Downloads Size (GB) Comment
Slovo ~16 Trimmed HD+ videos by (start, end) annotations
Origin ~105 Original HD+ videos from mining stage
360p ~13 Resized original videos by min_side = 360
Landmarks ~1.2 Mediapipe hand landmark annotations for each frame of trimmed videos

Also, you can download Slovo from Kaggle.

Annotation file is easy to use and contains some useful columns, see annotations.csv file:

attachment_id user_id width height length text train begin end
0 de81cc1c-... 1b... 1440 1920 14 привет True 30 45
1 3c0cec5a-... 64... 1440 1920 32 утро False 43 66
2 d17ca986-... cf... 1920 1080 44 улица False 12 31

where:

  • attachment_id - video file name
  • user_id - unique anonymized user ID
  • width - video width
  • height - video height
  • length - video length
  • text - gesture class in Russian Langauge
  • train - train or test boolean flag
  • begin - start of the gesture (for original dataset)
  • end - end of the gesture (for original dataset)

For convenience, we have also prepared a compressed version of the dataset, in which all videos are processed by the minimum side min_side = 360. Download link - slovo360p. Also, we annotate trimmed videos by using MediaPipe and provide hand keypoints in this annotation file.

Models

We provide some pre-trained models as the baseline for Russian sign language recognition. We tested models with frames number from [16, 32, 48], and the best for each are below. The first number in the model name is frames number and the second is frame interval.

Model Name Model Size (MB) Metric ONNX TorchScript
MViTv2-small-16-4 140.51 58.35 weights weights
MViTv2-small-32-2 140.79 64.09 weights weights
MViTv2-small-48-2 141.05 62.18 weights weights
Swin-large-16-3 821.65 48.04 weights weights
Swin-large-32-2 821.74 54.84 weights weights
Swin-large-48-1 821.78 55.66 weights weights
ResNet-i3d-16-3 146.43 32.86 weights weights
ResNet-i3d-32-2 146.43 38.38 weights weights
ResNet-i3d-48-1 146.43 43.91 weights weights

Demo

usage: demo.py [-h] -p CONFIG [--mp] [-v] [-l LENGTH]

optional arguments:
  -h, --help            show this help message and exit
  -p CONFIG, --config CONFIG
                        Path to config
  --mp                  Enable multiprocessing
  -v, --verbose         Enable logging
  -l LENGTH, --length LENGTH
                        Deque length for predictions


python demo.py -p <PATH_TO_CONFIG>

demo

Authors and Credits

Citation

You can cite the paper using the following BibTeX entry:

@inproceedings{kapitanov2023slovo,
    title={Slovo: Russian Sign Language Dataset},
    author={Kapitanov, Alexander and Karina, Kvanchiani and Nagaev, Alexander and Elizaveta, Petrova},
    booktitle={International Conference on Computer Vision Systems},
    pages={63--73},
    year={2023},
    organization={Springer}
}

Links

License

Creative Commons License
This work is licensed under a variant of Creative Commons Attribution-ShareAlike 4.0 International License.

Please see the specific license.

slovo's People

Contributors

hukenovs avatar karinakvanchiani avatar kleinsbotle avatar leoromanovich avatar nagadit avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.