A famous weakly supervised learning model for video anomaly detection (VAD).
- ShanghaiTech i3d features
- Revised few inconvenient codes
- i3d features
- Extracted for every 16 frames
- 10 crops augmentation means torchvision TenCrop
- 3d features extraction: Use x3d rather than i3d, c3d which are 3D conv models
- Use torchvideo: pytorchvideo_x3d
- UCF_Crime x3d feature extraction (requires more than 24 hrs)
- Train RTFM with UCF_Crime + x3d dataset
- Good performance!
- Tested with new motion data (never seen): Got correct: white_check_mark:, Got correct: x:
- normal:{drinking ✅, googling ❌, normal ✅, toilet ✅}
- abnormal:{capture ❌, drawing ✅, writing ✅}
- Things to notice: 1. Result changes even the data does not have big difference 2. Model works fairly good for webcam domain data even it is not trained with
- If webcam domain data is given and weakly supervised, there is hope :)
- Webcam abnormal face data
- Webcam abnormal motion data
- RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'
- See cuda_troubleshooting.txt
- No visdom error
- visdom is a web-based visualization tool
- pip install visdom
- (in a new terminal) python -m visdom.server
- can see the result by accessing via web