Giter VIP home page Giter VIP logo

Comments (34)

fmthoker avatar fmthoker commented on July 18, 2024 1

@khurramHashmi The PyTorch code is already in the repository, I just added a new data loader for the NTU dataset for training data files provided in the above-mentioned google drive link (link). I must say the reproduced results are not exactly the same as the TensorFlow implementation.

from predict-cluster.

DragonLiu1995 avatar DragonLiu1995 commented on July 18, 2024

Hi zywbupt, thank you for your question! We actually put the label and action sequence together in pickle file, and the train test split is following the original evaluation protocol proposed by NTU-RGBD paper, you can refer to their rules of spliting the training and test data and also how to split data by cross-subject or cross-view. If I remember correctly, the .skeleton file has a sequence number, there's a 'P' in the number, and the 3 numbers followed by that letter 'P' is what we refer to when spliting the data, "1, 2, 4, 5, 8, 9, 13, 14, 15, 16, 17, 18, 19, 25, 27, 28, 31, 34, 35, 38" is for training, and all other ones are for testing. Correct me if I'm wrong.

from predict-cluster.

zywbupt avatar zywbupt commented on July 18, 2024

Thank you for you answer, but what is the format of the input sequence data, is only the x,y,z value of each joint in depth or rgb? As for the generated train_data_joint.npy file the format is in (N, C, T, V, M)

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@DragonLiu1995 Thanks for releasing the code.
I was checking how you pre-process ntu dataset and its not clear what should be the shape of each input sample in the follwoing code,
# Normalize Bones for i in range(len(train_data)): train_data[i]['input'] = normalize_bone(np.array(train_data[i]['input'])) for i in range(len(test_data)): test_data[i]['input'] = normalize_bone(np.array(test_data[i]['input']))

As I understood, in NTU-dataset each video is represented by [P x F x J x C] array, where P=2 persons with 1 being all zeros in case of the single person action, F= Number of frames, J= Number of Joints, and C = 3D joint coordinates. Can you explain the shape of each train_data[i]['input'] that goes into the normalization function?

from predict-cluster.

DragonLiu1995 avatar DragonLiu1995 commented on July 18, 2024

Thank you for you answer, but what is the format of the input sequence data, is only the x,y,z value of each joint in depth or rgb? As for the generated train_data_joint.npy file the format is in (N, C, T, V, M)

The input sequence is x,y,z coordinate of each joint, 25 joints with (x, y, z).

from predict-cluster.

DragonLiu1995 avatar DragonLiu1995 commented on July 18, 2024

@DragonLiu1995 Thanks for releasing the code.
I was checking how you pre-process ntu dataset and its not clear what should be the shape of each input sample in the follwoing code,
# Normalize Bones for i in range(len(train_data)): train_data[i]['input'] = normalize_bone(np.array(train_data[i]['input'])) for i in range(len(test_data)): test_data[i]['input'] = normalize_bone(np.array(test_data[i]['input']))

As I understood, in NTU-dataset each video is represented by [P x F x J x C] array, where P=2 persons with 1 being all zeros in case of the single person action, F= Number of frames, J= Number of Joints, and C = 3D joint coordinates. Can you explain the shape of each train_data[i]['input'] that goes into the normalization function?

Each training sample is 1 person. If there are 2 people in the video, we separate them into 2 sample sequences.

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@DragonLiu1995 Thanks for the clarification. However, I am getting NAN values in the normalization function when passing raw skeleton video sequences. Also, It is also not clear where have you computed the R matrix which is discussed in the paper.
Can you make the pre-processing part more explainable, is there something that I am missing?

from predict-cluster.

sukun1045 avatar sukun1045 commented on July 18, 2024

@fmthoker Hi, thanks for your interest again. So the current preprocess script is the processing part before feeding into the network. The data has been cleaned by a set of operations and also applied the view invariant transform. I just upload the view invariant transform example for NTU dataset. You can check the details. To save your time, I also share the processed data in the Google Drive. The raw_train/test_data.pkl are the clean data after removing noise/two persons situations etc. The trans_train/test_data.pkl are data after applying the view invariant transform.

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@sukun1045 Thank you so much for this, it is a life saver

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@DragonLiu1995 @sukun1045 Can you please let me know whether the without training results for NTU P&C Rand (Our) 56.4 39.6 in the paper are obtained with AEC or without. Without Training, I am able to achieve KnnACC W/O-AEC: 0.3901 W-AEC: 0.4398 for cross-view and KnnACC W/O-AEC: 0.3346 W-AEC: 0.3718 for cross-subject evaluation.

from predict-cluster.

sukun1045 avatar sukun1045 commented on July 18, 2024

@fmthoker Those results are obtained without training anything and depending on the initialization of network or possibly the framework (We have only tested it on TensorFlow). The reason to list P&C Rand is to show that even a random Encoder has already given some reasonable accuracy for such as a large dataset and a good training process should be able to improve this accuracy instead of getting lower or the same accuracy as shown in the LongT GAN case.

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@sukun1045 Thanks for the quick response, yes I am aware of that, the numbers that I mentioned above are also obtained without any training. However, I just wanted to know what initialization and framework (especially with or without AEC) did you use to obtain the reported numbers P&C Rand (Our) 56.4 39.6.

from predict-cluster.

sukun1045 avatar sukun1045 commented on July 18, 2024

@fmthoker We were using Tensorflow 1.14 and the code for the architecture would be similar to what was shown in the UCLA_demo.ipynb without AEC. For initialization, we use the default for GRU in tf 1.14 (random uniform initialization∈[−0.05,0.05])).

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@sukun1045 Thanks for the reply again. I also tried to reproduce the number using your TensorFlow code and the provided pre-processed data. However, the numbers don't match the reported results. Here are my results configurations using the TensorFlow (version 1.14) without any modifications. I used a batch size of 64, rnn size = 2048, sequence len = 50, feature_size 75 with fixed-state strategy. All the other hyperparameters are the same as in the UCLA_demo.ipynb.
Cross-subject = 0.48 and Cross-view=0.60.
Incidentally, these numbers are close to what I get, when I use the PyTorch Implementation.

from predict-cluster.

sukun1045 avatar sukun1045 commented on July 18, 2024

@fmthoker would you mind telling me if the PC Rand matches the reported value? I am cleaning up our previous code and rerun the experiment. It may take about 8 hours to get the final result but to give you an initial point, the knn score for the CS case should have about 56% (I just made the screen shots of what I just got from the initial random model and initial training). I will notice you when I am ready to publish the cleaned version of the notebook.
image
image

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@sukun1045 Thanks for the quick response. Without any training, I am getting 0.4812 for cross_view and 0.3915 for the cross_subject evaluation.
If possible can you share the training script, once you finish the experiment?

from predict-cluster.

sukun1045 avatar sukun1045 commented on July 18, 2024

@fmthoker Yes I will do that.

from predict-cluster.

sukun1045 avatar sukun1045 commented on July 18, 2024

@fmthoker you can check the NTU_demo notebook. It is a quick implementation for NTU cross view with fixed state strategy without AEC. It should have around 75% after 15000 iterations.

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@sukun1045 Thanks for sharing the script. I was able to reproduce the results with some margin. I would like to point out the problem was due to the bone normalization part which is present in the ntu_preporcess.py. Also, I removed the bone normalization part from the PyTorch code too and it also seems to work better now.

from predict-cluster.

sukun1045 avatar sukun1045 commented on July 18, 2024

@fmthoker Oh I see. It was an old normalization code that was tried before and it was accidentally added into the repo. Sorry about that.

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@sukun1045 It is fine. However, I was wondering whether you also performed a linear evaluation experiment by training only a fully connected layer using extracted features and labels.

from predict-cluster.

sukun1045 avatar sukun1045 commented on July 18, 2024

@fmthoker Yes we tried, but it added extra parameters and didn't perform well.

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@sukun1045 So with fine-tuning only the last layer with labels was worst than just clustering the final features using KNN without any labels. Do you have the exact numbers?

from predict-cluster.

sukun1045 avatar sukun1045 commented on July 18, 2024

@fmthoker From our previous experiments, fixing the trained encoder and fine-tuning only the classifier would provide about 61% acc for Cross View, which is very similar to what you get if you just fix the random initialization for encoder and train the one classifier. If you train the encoder and classifier jointly in the supervised setting, the acc is about 80%. Hope these information are helpful.

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@sukun1045 Thanks for this information, it is really helpful. But, It seems a little strange that fine-tuning a classifier with using trained features performs worst if you simply cluster the same features without any labels. What do you think is the reason for this?

from predict-cluster.

sukun1045 avatar sukun1045 commented on July 18, 2024

@fmthoker I don't have a clear answer for that but from my understanding, the self-supervised training task (regeneration) may not learn exactly the same features as what are usually seen in supervised classification. I think the representations learned by either case can somehow separate the actions but they are not acting like the same way.

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@sukun1045 I do understand your point. But I am just wondering about how would we compare new self-supervised methods with this method fairly. Using a linear evaluation your method does not seem to perform well while using clustering for evaluation your method seems to learn good representations. Both cases have different use cases.

from predict-cluster.

sukun1045 avatar sukun1045 commented on July 18, 2024

@fmthoker To me, it only makes sense to compare these two cases separately. Evaluation via knn will show the effectiveness of the pure representation you retrieve from self-supervised method. Result of fine-tuning one classifier can demonstrate the flexibility of the learned representation for various coming tasks.

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@sukun1045 Thanks for the discussion and overall help. I will get back to you for more questions in the future.

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@sukun1045 FYI. I reproduced the results for using Pytorch implementation for the NTU-60 dataset. Here are some important points/results.

  • P&C Random : 50.49 (Cross-view) and 36.02(Cross-subject) with K=1

  • P&C Random: 47.35 (Cross-view) and 39.03(Cross-subject) with K=9

  • P&C Fixed weight: 72.09 (Cross-view) and 46.59(Cross-subject) with K=1

  • P&C Fixed weight: 60.33 (Cross-view) and 50.05(Cross-subject) with K=9

It seems the number of neighbours impacts the performance differently for cross-view and cross-subject training.
Also, using PyTorch the network seems to converge very quickly, for cross-view only after 15 epochs and for cross-subject only after 5 epochs. Training for more epochs resulted in a decrease in performance.

One more thing, would it be possible for you to share your implementation of this paper LongT GAN [36]. I need to do a linear evaluation of this method too.

from predict-cluster.

khurramHashmi avatar khurramHashmi commented on July 18, 2024

Hi @fmthoker ,
You have mentioned that you have reproduced the results using Pytorch. Could you please care to share the repository?

Thanks!

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@DragonLiu1995 @sukun1045 Can you please mention how you created raw data files (raw_train/test_data.pkl ) from the original .skeleton files for NTU-60. Actually, I am trying to do the same for NTU-120, but I get a (divide by zero) error during the view-invariant transformation script. Can you share how to convert .skeleton files provided by the dataset authors into the above-mentioned format?

from predict-cluster.

DragonLiu1995 avatar DragonLiu1995 commented on July 18, 2024

Can you please mention how you created raw data files (raw_train/test_data.pkl ) from the original .skeleton files for NTU-60. Actually, I am trying to do the same for NTU-120, but I get a (divide by zero) error during the view-invariant transformation script. Can you share how to convert .skeleton files provided by the dataset authors into the above-mentioned format?

Hi @fmthoker,
The steps to get (raw_train/test_data.pkl ) are included in NTU60_preprocess.ipynb under the preprocess folder. There are 3 stpes to get the final skeletons. Code for step 1 & 2 are adjusted from this repo. step 3 is simply splitting the data by Cross-View/Cross-Subject scheme.

from predict-cluster.

fmthoker avatar fmthoker commented on July 18, 2024

@DragonLiu1995 Thank you for your help I was able to run the code for NTU-120 now.

from predict-cluster.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.