kennymckormick / pyskl Goto Github PK
View Code? Open in Web Editor NEWA toolbox for skeleton-based action recognition.
License: Apache License 2.0
A toolbox for skeleton-based action recognition.
License: Apache License 2.0
Dear kenny,
First of all, thank you for your excellent work!Recently, I am trying to run your code of pyskl. However, I have met a problem about examples/extract_diving48_skeleton.
I would like to get diving48_annos.pkl through following your notebook diving48_example.ipynb. But when I run the code, there is a error:
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/anaconda3/envs/cdpytorch/bin/python', '-u', 'tools/data/custom_2d_skeleton.py', '--local_rank=1', '--video-list', 'examples/extract_diving48_skeleton/diving48.list', '--out', 'examples/extract_diving48_skeleton/diving48_annos.pkl']' died with <Signals.SIGSEGV: 11>.
I have checked the github issue and searched on the Internet. Unfortunately, I haven't found relevant solutions. So I sent you an email, hoping to get your help!Thank you very much!
Thanks for get work!
I tried to download kinetics skeleton data many times last night, the link is very unstable, and seems to expired now.
When I run /tools/data/custom_2d_skeleton.py,there has a problem
File "tools/data/custom_2d_skeleton.py", line 176, in
main()
File "tools/data/custom_2d_skeleton.py", line 123, in main
init_dist('pytorch', backend='nccl')
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/mmcv/runner/dist_utils.py", line 29, in _init_dist_pytorch
rank = int(os.environ['RANK'])
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/os.py", line 675, in getitem
raise KeyError(key) from None
KeyError: 'RANK'
I can't find RANK from "os.environ['RANK']",so I change 'LOCAL_RANK' to 'RANK',but there has other problem,about that:
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 224, in _env_rendezvous_handler
world_size = int(_get_env_or_raise("WORLD_SIZE"))
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 203, in _get_env_or_raise
raise _env_error(env_var)
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable WORLD_SIZE expected, but not set
who solved this question,please give some advises,Thank you very much!
sir,now it has FineGym99,it will release FineGym288?
In your released paper, i found that the result in table 1 and table 4 is different, does the difference originate from the "spatial augmenatations" (like random rotation or scaling) mentioned in the paper? Can you please tell me how to apply random rotation or scaling in the config file?
in which guide should I follow to train my custom dataset with posec3d? Should I follow the one in mmaction2 or is there a different one for pyskl? Thanks.
when I run command: python demo/demo_skeleton.py demo/ntu_sample.avi demo/demo.mp4 --config configs/stgcn++/stgcn++_ntu120_xsub_hrnet/j.py --ckpt http://download.openmmlab.com/mmaction/pyskl/ckpt/stgcnpp/stgcnpp_ntu120_xsub_hrnet/j.pth
I get an error:
usage: demo_skeleton.py [-h] [--config CONFIG] [--checkpoint CHECKPOINT] [--det-config DET_CONFIG] [--det-checkpoint DET_CHECKPOINT]
[--pose-config POSE_CONFIG] [--pose-checkpoint POSE_CHECKPOINT] [--det-score-thr DET_SCORE_THR] [--label-map LABEL_MAP]
[--device DEVICE] [--short-side SHORT_SIDE]
video out_filename
demo_skeleton.py: error: unrecognized arguments: --ckpt http://download.openmmlab.com/mmaction/pyskl/ckpt/stgcnpp/stgcnpp_ntu120_xsub_hrnet/j.pth
I have built the environment according to the documents, dist_train.sh can work.
Traceback (most recent call last):
File "vision_skeleton.py", line 13, in
anno = annotations[index]
KeyError: 0
Hi!Kenny! Thank you for sharing such excellent work!I have met another error when extracting the skeleton of diving48. I felt very confused.
I can run demo successfully.
However, when I run custom_2d_skeleton.py, there is an error about init_detector and init_pose_model
If I modify the default_det_config and default_det_ckpt according demo_skeleton.py
I really don't know how to solve the problem, if anyone meet the same problem like me , please help me, thank you.
Hi, thank you for providing excellent code and also the dataset!
I'm trying to use the k400 dataset from the following instructions.
https://github.com/kennymckormick/pyskl/blob/main/tools/data/data_doc.md
However, when I try to extract the zip file(kpfiles.zip), the extractor warns that there are 2,652 pkl files that have the same name.
Is it ok to replace the files which have the same name? Is it expected behavior?
Thank you in advance!
when I run tools/data/custom_2d_skeleton.py file, I get the following error:
Traceback (most recent call last):
File "./tools/data/custom_2d_skeleton.py", line 172, in
main()
File "./tools/data/custom_2d_skeleton.py", line 119, in main
init_dist('pytorch', backend='nccl')
File "/home/ubuntu/miniconda3/envs/aiguard1/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 40, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/home/ubuntu/miniconda3/envs/aiguard1/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 51, in _init_dist_pytorch
rank = int(os.environ['RANK'])
File "/home/ubuntu/miniconda3/envs/aiguard1/lib/python3.7/os.py", line 681, in getitem
raise KeyError(key) from None
KeyError: 'RANK'
Dependencies:
python 3.7
pytorch 1.11
cudatollkit 11.3
mmcv-full 1.5.0
mmdet 2.24.0
-Use single GPU
Can anyone give me some idea to solve this? Thanks!
sir,what is the command to train with single GPU?My command is bash tools/dist_test.sh configs/posec3d/slowonly_r50_diving48/joint.py work_dirs/posec3d/slowonly_r50_diving48/joint/latest.pth 1 --out yaml/testwork.json --eval top_k_accuracy mean_class_accuracy , the error is KeyError 'val'.
/pyskl-main$ python demo/demo_skeleton.py demo/ntu_sample.avi demo/demo.mp4
Traceback (most recent call last):
File "demo/demo_skeleton.py", line 16, in
from pyskl.apis import inference_recognizer, init_recognizer
ModuleNotFoundError: No module named 'pyskl'
when I run the demo_skeleton.py, it wrong
Thanks for great repository !!!
I'd like to appreciate it if webcam_demo code will be added soon.
Hello there!!
Thank you very much for such great work!
Could you please let me know where I can find "GenPseudoHeadmaps.ipynb"?
Thank you very much in advance
Hi, I use 2 GPUs when training the model, but I find that by default I can only use GPU 0 and 1. If I want to train on GPU 2 and 3, where do I need to set ids of GPUs to use
I really appreciate the work presented, Congratulations!
However, I encounter a problem when I run dist_train.sh
with a single GPU.
Running the following line:
bash tools/dist_train.sh configs/posec3d/slowonly_r50_ntu120_xset/joint.py 1 --validate --test-last --test-best
The error follows:
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 50075) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 715, in run
elastic_launch(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
======================================================
tools/train.py FAILED
------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2022-04-20_11:31:39
host : *******
rank : 0 (local_rank: 0)
exitcode : -9 (pid: 50075)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 50075
======================================================
Info about my machine:
sys.platform: linux
Python: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 3070
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.6.r11.6/compiler.31057947_0
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.11.0+cu113
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.2
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.12.0+cu113
OpenCV: 4.5.5
MMCV: 1.4.8
MMCV Compiler: n/a
MMCV CUDA Compiler: n/a
pyskl: 0.1.0+f2cefec
sir,what is the top1_acc top5_acc on diving48?,my train accuracy is top1 90.1 top5 98 ,but my test accuracy is only top1 54.3 top5 86.6
I want to watch changing train loss and validation loss, example wandb or just simple plots. How i can do that? This framework provides such an opportunity?
Thanks
I would like to reproduce the 93.7% Mean Top-1 score of PoseConv3D, on NTU60-Xsub(SlowOnly-R50/joint), by using 8 GPUs and inheriting the same setups, but the best outcome are top1_acc 0.9338 and mean_class_accuracy 0.9337 in the 24 epoch, there is a 0.32% gap. Moreover, the best scores of your log file (https://download.openmmlab.com/mmaction/skeleton/posec3d/slowonly_r50_u48_240e_ntu60_xsub_keypoint/slowonly_r50_u48_240e_ntu60_xsub_keypoint.log) are top1_acc 0.9352 and mean_class_accuracy 0.9351 in the 23 epoch, it is also a 0.18% gap compared to 93.7%.
The reason may be that my test method is wrong, are there any settings that need to be adjusted?
`(pyskl) dong@dong-HP-Z1-Entry-Tower-G6:~/Code/Pyskl/pyskl$ python demo/demo_skeleton.py demo/ntu_sample.avi demo/demo.mp4
load checkpoint from local path: .cache/joint_f6bed715.pth
load checkpoint from http path: http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth
Performing Human Detection for each frame
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 72/72, 11.9 task/s, elapsed: 6s, ETA: 0sload checkpoint from http path: https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth
Performing Human Pose Estimation for each frame
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 72/72, 13.9 task/s, elapsed: 5s, ETA: 0sload checkpoint from http path: https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth
Moviepy - Building video demo/demo.mp4.
Moviepy - Writing video demo/demo.mp4
Moviepy - Done !
Moviepy - video ready demo/demo.mp4
`
Its' seems fine, but the generated video can't be played normally. It's just a black screen, stucked at 00:00.
Hello, Ph.D. Duan. I want to konw the format of the "--videos_list" for the whole dataset. for example: UCF101, they have many folders with multiple labels, according to the comments is it necessary to put them all on one line? Could you give me a sample, please?
When i run the demo_skeleton.py with http://download.openmmlab.com/mmaction/pyskl/ckpt/posec3d/slowonly_r50_346_k400/joint.pth(slowonly_r50_346_k400/joint.py), there some information like this:
when i read the paper,i saw "For all datasets except FineGYM, 2D poses are obtained by directly applying Top�Down pose estimators to RGB inputs ". So i want to know how to extract 2d pose of FineGym,is same of diving48?
Regarding the 17 keypoints in the Finegym dataset, I want to know, for example, which keypoint represents the left eye, just want to know the specific correspondence.
Is there any script for two-stream model fusion? Or I should use the result.pkl and fuse them by myself?
sir,when i test with single GPU,the error is KeyError :"val" and KeyError:"PoseDataset: 'val". My command is bash tools/dist_test.sh configs/posec3d/slowonly_r50_diving48/joint.py work_dirs/posec3d/slowonly_r50_diving48/joint/latest.pth 1 --out yaml/testwork.json --eval top_k_accuracy mean_class_accuracy thanks for you!
Hi, thanks for your awesome work! Is it possible for you to provide the versions of mmcv-full, mmdet, pytorch, cuda for successful running of the repo or point to a dockerfile ? Thank you!
sir,my test command is bash tools/dist_test.sh configs/posec3d/slowonly_r50_diving48/joint.py work_dirs/posec3d/slowonly_r50_diving48/joint/latest.pth 1 --out yaml/testwork.json --eval top_k_accuracy mean_class_accuracy, and it will occur two errors, the first is KeyError: 'val'. the second is KeyError: "PoseDataset: 'val'" . i dont know how to solve, my test command is wrong?
i followed your suggestion,
@kennymckormick
Collaborator
kennymckormick commented 14 days ago
Hi, jeongeun12,
Currently PoseC3D does not support training with 3d joints directly. To use PoseC3D, you need to first project 3d keypoints to 2d keypoints (for example, find a project view that reserve the majority of inter-keypoint variance).
If you want to directly use 3d joints for training, I recommend you to move to PYSKL, where we released an original SOTA gcn model named ST-GCN++
i finally trained st_gcn++ with 3d kp datasets,and i want to demo it.
is it ok that i use inference_recognizer model in demo_posec3d code ??
training epoch of PoseC3D is different in config(24 epochs) and readme(240 epochs), is that a typo?
Hi, Kenny. Thank you for sharing such excellent work.
Recently, I am learning the ResNet3dSlowOnly in pyskl .
I feel confused about the parameter pretrained2d(bool): Whether to load pretrained 2D model. Default: True, In class ResNet3d of models/cnns/resnet3d.py.
In the original code, self.pretrained2d = True, I think it means you use the pretrained 2D model. However I can't find the loaded pretrained 2D model and I don't know where the pretrained
model is loaded (torchvision? or other places? I can't find the relative code to appoint it).
If I set pretrained2d = False, Can I still train normally? Will the final result be much worse than when loading the pretrained 2D model?
Sincerely hope to get your reply!
Hello, I would like to ask you how to achieve PoseC3D in your own computer?
Is there anything else to be configured after the requirement is configured?
Which file do I need to run?
Thanks for sharing the implementation. When running the demo, the following error message. Please advice.
(pyskl) D:\gitSources\pyskl>python demo/demo_skeleton.py demo/ntu_sample.avi demo/demo.mp4
load checkpoint from http path: https://download.openmmlab.com/mmaction/pyskl/ckpt/posec3d/slowonly_r50_ntu120_xsub/joint.pth
Traceback (most recent call last):
File "demo/demo_skeleton.py", line 309, in
main()
File "demo/demo_skeleton.py", line 250, in main
det_results = detection_inference(args, frame_paths)
File "demo/demo_skeleton.py", line 157, in detection_inference
assert model.CLASSES[0] == 'person', ('We require you to use a detector '
AttributeError: 'NoneType' object has no attribute 'CLASSES'
(pyskl) D:\gitSources\pyskl>
Computer:
Y720
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz 2.80 GHz
16.0 GB (15.9 GB usable)
NVIDIA GeForce GTX 1060
Windows 10 Home
Installation:
conda create -n pyskl python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y
pip install -r requirements.txt
pip install -e .
bash tools/dist_train.sh pyskl/configs/posec3d/slowonly_r50_diving48/joint.py Sorry,i am a
newer,is this the command to train with single GPU?
When I run /tools/data/custom_2d_skeleton.py,there has a problem
File "tools/data/custom_2d_skeleton.py", line 176, in
main()
File "tools/data/custom_2d_skeleton.py", line 123, in main
init_dist('pytorch', backend='nccl')
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/mmcv/runner/dist_utils.py", line 18, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/mmcv/runner/dist_utils.py", line 29, in _init_dist_pytorch
rank = int(os.environ['RANK'])
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/os.py", line 675, in getitem
raise KeyError(key) from None
KeyError: 'RANK'
I can't find RANK from "os.environ['RANK']",so I change 'LOCAL_RANK' to 'RANK',but there has other problem,about that:
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 224, in _env_rendezvous_handler
world_size = int(_get_env_or_raise("WORLD_SIZE"))
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 203, in _get_env_or_raise
raise _env_error(env_var)
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable WORLD_SIZE expected, but not set
who solved this question,please give some advises,Thank you very much!
`2022-07-12` 09:06:11,803 - pyskl - INFO - workflow: [('train', 1)], max: 24 epochs
2022-07-12 09:06:11,804 - pyskl - INFO - Checkpoints will be saved to /content/pyskl/work_dirs/posec3d/slowonly_r50_diving48/joint by HardDiskBackend.
tcmalloc: large alloc 1421410304 bytes == 0x16513c000 @ 0x7f96e32b4615 0x592b76 0x4df71e 0x59394f 0x5957cf 0x595b69 0x4e7b1f 0x4ebeeb 0x44f8bc 0x4e9074 0x4ebe42 0x4ec608 0x4eb932 0x4ec55d 0x4e9074 0x4ebe42 0x4ec55d 0x4e9074 0x4ebe42 0x44f841 0x4ec608 0x4e9074 0x4ebe42 0x55e1fa 0x59afff 0x515655 0x549576 0x593fce 0x548ae9 0x51566f 0x593dd7
tcmalloc: large alloc 1421410304 bytes == 0x7f94e97fe000 @ 0x7f96e32b4615 0x592b76 0x4df71e 0x59394f 0x5957cf 0x595b69 0x4e7b1f 0x4ebeeb 0x44f8bc 0x4e9074 0x4ebe42 0x4ec608 0x4eb932 0x4ec55d 0x4e9074 0x4ebe42 0x4ec55d 0x4e9074 0x4ebe42 0x44f841 0x4ec608 0x4e9074 0x4ebe42 0x55e1fa 0x59afff 0x515655 0x549576 0x593fce 0x548ae9 0x51566f 0x593dd7
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 634) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/run.py", line 718, in run
)(*cmd_args)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
====================================================
tools/train.py FAILED
Hi, it's really an amazing job! I really like it. But I didn't find the config about using SlowFast network (there is only SlowOnly network configuration). So could you offer a script about that? Thank you so much!
Hello,
Thank you for sharing this great repository. I want to train and test the model on other datasets, however when I use diving48_example.ipynb to extract pose data from Diving48 based on your instructions I encounter this error:
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -11) local_rank: 0 (pid: 234913) of binary: /usr/bin/python3.8
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/galinezh/.local/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/galinezh/.local/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/galinezh/.local/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/galinezh/.local/lib/python3.8/site-packages/torch/distributed/run.py", line 715, in run
elastic_launch(
File "/home/galinezh/.local/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/galinezh/.local/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
/home/galinezh/pyskl/tools/data/custom_2d_skeleton.py FAILED
Failures:
[1]:
time : 2022-06-13_23:04:55
host : coe54000151lws.dyn.uncc.edu
rank : 1 (local_rank: 1)
exitcode : -11 (pid: 234914)
error_file: <N/A>
traceback : Signal 11 (SIGSEGV) received by PID 234914
[2]:
time : 2022-06-13_23:04:55
host : coe54000151lws.dyn.uncc.edu
rank : 2 (local_rank: 2)
exitcode : -11 (pid: 234915)
error_file: <N/A>
traceback : Signal 11 (SIGSEGV) received by PID 234915
[3]:
time : 2022-06-13_23:04:55
host : coe54000151lws.dyn.uncc.edu
rank : 3 (local_rank: 3)
exitcode : -11 (pid: 234916)
error_file: <N/A>
traceback : Signal 11 (SIGSEGV) received by PID 234916
Root Cause (first observed failure):
[0]:
time : 2022-06-13_23:04:55
host : coe54000151lws.dyn.uncc.edu
rank : 0 (local_rank: 0)
exitcode : -11 (pid: 234913)
error_file: <N/A>
traceback : Signal 11 (SIGSEGV) received by PID 234913
Can anybody help me solve this issue?
Thanks!
Does the data pipeline, for example the train_pipeline in "configs/stgcn++/stgcn++_ntu60_xview_3dkp/j.py", use GPU? The pyskl occupies more GPUs memory than mmaction2, at least 2 times.
I try to train CTRGCN with 2d skeleton and joint modality, but the accuracy dropped over 1%. I didn't change anything, except training with only 4 gpus. Would a few less gpus have such a big impact on accuracy? It's quite strange. So I wonder if you have tuned any hyperparameters during training. Thanks in advance.
I am sorry that I cannot find configs of RGB+Pose models that there should be two paths. But in configs/posec3d I can only find configs that only take pose as inputs, i.e., one path, the input channel is 17. Could you help me with that? Thanks.
load checkpoint from http path: https://download.openmmlab.com/mmaction/pyskl/ckpt/posec3d/slowonly_r50_ntu120_xsub/joint.pth
load checkpoint from http path: http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_2x_coco/faster_rcnn_r50_fpn_2x_coco_bbox_mAP-0.384_20200504_210434-a5d8aa15.pth
Performing Human Detection for each frame
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 72/72, 20.9 task/s, elapsed: 3s, ETA: 0sload checkpoint from http path: https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w32_coco_256x192-c78dce93_20200708.pth
Performing Human Pose Estimation for each frame
[ ] 0/72, elapsed: 0s, ETA:Traceback (most recent call last):
File "demo/demo_skeleton.py", line 314, in
main()
File "demo/demo_skeleton.py", line 258, in main
pose_results = pose_inference(args, frame_paths, det_results)
File "demo/demo_skeleton.py", line 184, in pose_inference
pose = inference_top_down_pose_model(model, f, d, format='xyxy')[0]
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/mmcv/utils/misc.py", line 340, in new_func
output = old_func(*args, **kwargs)
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/mmpose/apis/inference.py", line 380, in inference_top_down_pose_model
poses, heatmap = _inference_single_pose_model(
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/mmpose/apis/inference.py", line 247, in _inference_single_pose_model
data = test_pipeline(data)
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/mmpose/datasets/pipelines/shared_transform.py", line 107, in call
data = t(data)
File "/home/xunlong/anaconda3/envs/open-mmlab/lib/python3.8/site-packages/mmpose/datasets/pipelines/top_down_transform.py", line 289, in call
c = results['center']
KeyError: 'center'
I can't run pose_infererce ,but the frame_paths and det_results are not none,I don't know where is wrong.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.