Dear Luigi Piccinelli, I hope this message finds you well. I wanted to express my

Abnormal Training Phenomena and Bad Performance about idisc HOT 6 OPEN

sunpihai-up commented on September 20, 2024

Abnormal Training Phenomena and Bad Performance

from idisc.

Comments (6)

lpiccinelli-eth commented on September 20, 2024

Thank you for your appreciation.

In my experience, the training loss is quite high, too. I would double check if the model is using the backbone pretrained on ImageNet, namely, does it print out something like "Encoder is pretrained from..." at the beginning of the training?
Another thing to check might be a mismatch between validation and training GT (for instance the depth_scale, usually for KITTI is 256.0).

Any additional information may be helpful in understanding where the problem lies.
Best.

from idisc.

sunpihai-up commented on September 20, 2024

Thank you for your appreciation.

In my experience, the training loss is quite high, too. I would double check if the model is using the backbone pretrained on ImageNet, namely, does it print out something like "Encoder is pretrained from..." at the beginning of the training? Another thing to check might be a mismatch between validation and training GT (for instance the depth_scale, usually for KITTI is 256.0).

Any additional information may be helpful in understanding where the problem lies. Best.

Thanks for your reply, I have investigated the code as per your suggestion.
Firstly, I verified that the program correctly loads the swin_large_22k model pretrained on ImageNet.

Since I already have the pre-trained model locally, I modified the code that originally used the URL to load the model, and instead loaded it using the local file path.

# Before Modification
if pretrained:
            print(f"\t-> Encoder is pretrained from: {pretrained}")
            pretrained_state = load_state_dict_from_url(pretrained, map_location="cpu")[
                "model"
            ]
            info = self.load_state_dict(deepcopy(pretrained_state), strict=False)
            print("Loading pretrained info:", info)

# After Modification
if pretrained:
            
            from urllib.parse import urlparse
            def is_url(path):
                # Check pretrained is URL or path
                result = urlparse(path)
                return all([result.scheme, result.netloc])
            
            print(f"\t-> Encoder is pretrained from: {pretrained}")
            if is_url(pretrained):
                pretrained_state = load_state_dict_from_url(pretrained, map_location="cpu")[
                    "model"
                ]
                info = self.load_state_dict(deepcopy(pretrained_state), strict=False)
                print("Loading pretrained info:", info)
            else:
                pretrained_state = torch.load(pretrained, map_location="cpu")["model"]

Therefore, when I run the training program, four prompt messages (from four processes) will be printed: Encoder is pretrained from: /home/sph/data/swin_transformer/swin_large_patch4_window7_224_22k.pth.

Secondly, I addressed the alignment issue you mentioned between the training set and the test set. I used the Eigen splits on KITTI for both the training set and the test set. However, I couldn't find any factors in the program that could cause a mismatch between them. I noticed that loading the training set and the test set uses the same code module (class KITTIDataset). Also, their depth_scale values are both set to 256.

Additionally, I performed tests on the training set and the test set using the weights you provided and the weights I trained myself, respectively (using test.py from the repository).

Training Set: Randomly selected 600 images from the training set on Eigen Split.

Test Set: All 652 valid images from the test set on Eigen Split.

	The model weights you provide	The model weights I trained
Training Set	Test/SILog: 0.38459012309710183 d05 0.9829 (0.9829) d1 0.997 (0.997) d2 0.9995 (0.9995) d3 0.9999 (0.9999) rmse 1.1298 (1.1298) rmse_log 0.0408 (0.0408) abs_rel 0.0259 (0.0259) sq_rel 0.0404 (0.0404) log10 0.0111 (0.0111) silog 3.7325 (3.7325)	Test/SILog: 0.3662095022201537 d05 0.9858 (0.9858) d1 0.9973 (0.9973) d2 0.9995 (0.9995) d3 0.9999 (0.9999) rmse 0.9945 (0.9945) rmse_log 0.0371 (0.0371) abs_rel 0.022 (0.022) sq_rel 0.0308 (0.0308) log10 0.0095 (0.0095) silog 3.6079 (3.6079)
Test Set	Test/SILog: 0.7632721244561964 d05 0.8968 (0.8968) d1 0.9771 (0.9771) d2 0.9973 (0.9973) d3 0.9993 (0.9993) rmse 2.0665 (2.0665) rmse_log 0.0772 (0.0772) abs_rel 0.0504 (0.0504) sq_rel 0.1455 (0.1455) log10 0.0218 (0.0218) silog 7.0735 (7.0735)	Test/SILog: 1.2326601183995969 d05 0.7809 (0.7809) d1 0.9256 (0.9256) d2 0.9846 (0.9846) d3 0.9958 (0.9958) rmse 3.177 (3.177) rmse_log 0.1249 (0.1249) abs_rel 0.081 (0.081) sq_rel 0.3827 (0.3827) log10 0.0351 (0.0351) silog 11.2095 (11.2095)

Both models perform similarly on the training set (or my trained model even performs slightly better). However, there is a significant difference in performance between the two models on the test set. This suggests the presence of overfitting. However, during the training process, there was no occurrence of the evaluation metric initially improving and then deteriorating later.

I look forward to hearing your further suggestions. Thank you once again for your reply. Best wishes to you!

from idisc.

lpiccinelli-eth commented on September 20, 2024

You could try using the provided checkpoint and test it on your data/code and see if the results match the ones provided.
If they match then the problem is the training, if not, the problem might be the data.
Best.

from idisc.

sunpihai-up commented on September 20, 2024

You could try using the provided checkpoint and test it on your data/code and see if the results match the ones provided. If they match then the problem is the training, if not, the problem might be the data. Best.

Yes, I did exactly that. The table I provided describes this work.
I wanted to evaluate the effectiveness of my training, so I tested it separately on the training set using both the checkpoint you provided and the one I trained.
I also wanted to check my data, so I tested it separately on the test set using both the checkpoint you provided and the one I trained.
However, the results were peculiar. The checkpoint you provided performed well on both the training and test sets. On the other hand, the checkpoint I trained outperformed yours on the training set but performed poorly on the test set.

So, the fact that the checkpoint you provided performs well on both my training and test sets suggests that there might not be an problem with my dataset.
The checkpoint I trained myself performs very well on the training set, which indicates that my training process is effective.
However, the strange thing is that my own trained checkpoint performs even better than the one you provided on the training set, yet it performs very poorly on the test set. This seems to exhibit signs of overfitting, but based on the evaluation metric trends during the training process, it doesn't appear that overfitting occurred.

from idisc.

lpiccinelli-eth commented on September 20, 2024

Honestly, I do not know, you are not seeing any overfitting, but it does not generalize either since the training metrics are good, but not the validation ones. Moreover, KITTI validation and training are pretty similar, so I wonder why such drop.
In addition, I was able to reproduce the results with SWin-Tiny: I checked validation after the first 1k steps and they matched my original training.

Either the training set is different wrt the one I used (I used the "new" Eigen split, namely the one after 2019) or the configs (i.e., augmentations, training schedule/lr, etc...) have something different.

from idisc.

sunpihai-up commented on September 20, 2024

Honestly, I do not know, you are not seeing any overfitting, but it does not generalize either since the training metrics are good, but not the validation ones. Moreover, KITTI validation and training are pretty similar, so I wonder why such drop. In addition, I was able to reproduce the results with SWin-Tiny: I checked validation after the first 1k steps and they matched my original training.

Either the training set is different wrt the one I used (I used the "new" Eigen split, namely the one after 2019) or the configs (i.e., augmentations, training schedule/lr, etc...) have something different.

Thank you for your assistance! This situation is indeed perplexing. I believe we can rule out differences in the dataset and configuration since I used the kitti_eigen_test.txt and kitti_eigen_train.txt files provided in the repository. I also verified that the file paths in the loaded dataset match exactly with the split. Furthermore, I haven't made any modifications to the configuration file.

I would like to make some attempts based on your work. Therefore, I will continue to try and debug the issue.

Once again, thank you for your help, and I wish you a pleasant day!

from idisc.

Abnormal Training Phenomena and Bad Performance about idisc HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent