Comments (14)
Thanks for your question!
Iterating on the dataset is really a good idea. Although we did not try this on the PGDF before, our previous work [1] did a similar thing through this intuition. You can refer to Fig. 9 in the work [1]. In work [1], the experiment is set on a low noise ratio (20%) and after the first processing of the original dataset, the noise ratio is decreased to a quite low value (<1%). As a result, iterations bring very little performance gain. But I think it may work in the heavy noise scenario. Our team's further research may work on this.
Reference:
[1] Zhu, Chuang, et al. "Hard sample aware noise robust learning for histopathology image classification." IEEE Transactions on Medical Imaging 41.4 (2021): 881-894.
from pgdf.
Hi ;)
Tanks for your well documented reply !
I think by changing with another model in each iterations (resnet, vit, mobilenet, clip, etc.) can improve too ! I know that each model has its own "perception of vision". The more they will be different, the more the perception are different.
Another proposition is to change SGD optimizers by Adam or Adabelief (faster convergence, better convergence).
Would you be interested that I work on it with you ?
from pgdf.
Moreover, I think for better warmup convergence, using early stopping could increase your results. As I see you have fixed a variable to set the number of warmup_step. It could be better to use the best checkpoint on val to go on the second step training. With this you will have a "soft-parameter" than an "hard parameter".
from pgdf.
Another thing,
I think that using the confidence entropy to get the prediction confidence (relative percentage from threshold ) could be better to evaluate if the data is in the noisy labels or not.. a better filtter based on the capacity of the model to get good predictions. It should remove some hard hyperparamters too 😉 and having better results..
from pgdf.
Hi,
Your ideas are very interesting and impressive! Thanks a lot for your reply and invitation. However, I will graduate and start working in a company next month, so I may not have enough time to work on it in the future. Thanks again for your kindness and wish you success in your research 😊.
from pgdf.
Another tip,
as I see your concurrents used larger models, maybe doing like them should improve results..
I have a question: when you talk about cifar10-sym90, you say that 10% only is good labeled and 90% is random label from the 10 classes ? If it is yes, I imagine that your work could label all type of image classification without any data labeled !
so Maybe trying to handle the problem with any labeled data could be a good think to test. If you are confident about this, the only problem should be to match on val the outputs by selecting the outputs based on the val.. If it not works maybe by doing some self-supervised learning like one of the last papers should help.
I think that open your future work to text classification and to tabular data, should make some noise in the domain.
from pgdf.
The answer is yes. But when the noise ratio is at a high level, the performance becomes unstable. It is a common issue in many LNL algorithms. And work [1] mentioned that pretraining the model weight by contrast learning can significantly achieve performance gain. You can also try on this.
Reference:
[1] Zheltonozhskii, Evgenii, et al. "Contrast to divide: Self-supervised pre-training for learning with noisy labels." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2022.
from pgdf.
I would try another thing
not use the same model twice in the iteration but two different architectures.
And waiting in warmup step that the 2 models converge independently
from pgdf.
Another thing (sorry for disturbing);
it it better to quantify the overall probability of confidence over all the class than just using best prediciton for prob_his1 and prob_his2 like
prob1 = m*prob1_gmm + (1-m)*prob_his1
prob2 = m*prob2_gmm + (1-m)*prob_his2
prob_gmm is a probability of app appartenance.
Like the NegEntropy (or other) I think it would be better to not use prob_his1 but the probability P from this kind of formula : lop(p_i) + sum ( log(1-p_not_i)) => log(P)
this is like a binary crossentropy.
or maybe using directly the NegEntropy and convert it to a scalar P: from sum(p_i log(p_i)) to P
from pgdf.
For args.md maybe trying:
- a linear scheduler that converge linearly from 0 to 1
- using a metric like accuracy on val (if 96% of accuracy) to fix m to 0.96. Something like that.
- doing a clustering on the two points prob_mm and prob_his with GMM and predict the probability of appaertenance to the class (this is like one of your mainstream idea)
With that you ll not have a hyperparamter args.md to fix.
from pgdf.
on the lines
pred1 = prob1 > 0.5
pred2 = prob2 > 0.5
I think determining the best threshold that maximize f1_score or precision or recall (you have the choice) on the val (or train) could improve the thredhold you fixed as 0.5. Improving this threshold should improve convergence and reduce the number of epochs
from pgdf.
It should be better to not use this on hyperparmeter
lr = args.learning_rate
if epoch >= args.lr_switch_epoch:
lr /= 10
but more a LrScheduler like these:
lr_scheduler.LinearLR
It can deliver the same results without having an hyperparameter or have better results.
See this link https://pytorch.org/docs/stable/optim.html
maybe by searching state of the arte in optimizers classifcation with LrScheduler should improve your results (faster convergence, better convergence)
from pgdf.
In definitive , the less hyperparameters you will have, the more stable your results will be.
If you want some help in the next months, I can make a state of the art in the meantime to help you make the code :)
It would be a pleasure to participate !!
from pgdf.
Thanks again for your helpful advice! 👍 I wish your research goes well!
from pgdf.
Related Issues (4)
- Requirements HOT 3
- details about mini webvision1.0 dataset HOT 2
- GMM after epoch 200 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgdf.