Giter VIP home page Giter VIP logo

Comments (8)

KichangKim avatar KichangKim commented on August 27, 2024

Hi. I originally implemented DeepDanbooru by using Microsoft's CNTK. With TensorFlow, there are some different behaviours to CNTK.

So I testd some parameter changes for TensorFlow:

  1. Increased learning rate: x 1000 ~ 5000
  2. Changed learning algorithm : Adam -> SGD

I does not have experience for PyTorch, but I think you should check PyTorch's default behaviour of internal network layer like conv, pooling, initializer, loss and so on. They all have different default parameters depending on the library.

Also, I noticed that you said "one-hot labels", but DeepDanbooru needs multi-label inputs. Not multi-class. DeepDanbooru's input vector should have multiple one-value (if the image has multiple tags).

from deepdanbooru.

tellurion-kanata avatar tellurion-kanata commented on August 27, 2024

Hi, thanks for your timely reply.
I think there is a little misunderstanding, sorry for my ambiguous description. I label each image with a paired one-hot vector with 7000 channels, setting 1 for existing tag and 0 for others. I think we are the same in this point according to your post in reddit two years ago.
ex:
1girl, red_hair, black_hair,...1000 other tags...,blah
1,1,0,...,0

So you didn't make specific improvements on the networks or training strategy? Like assigning larger weights for low-frequency tags, adopting loss functions which are able to restrict the negative samples like focal loss, or resample the dataset to make the distribution more balanced?

I will check the difference of default settings, thank you!

from deepdanbooru.

KichangKim avatar KichangKim commented on August 27, 2024

1girl, red_hair, black_hair,...1000 other tags...,blah 1,1,0,...,0

Oh, it looks fine.

Additionally, I filtered training dataset, using only images which has 20 or more general tags. Images which has < 20 general tags, simply ignored.

from deepdanbooru.

tellurion-kanata avatar tellurion-kanata commented on August 27, 2024

Thanks you for your kind help!
I would do the same filtering for my dataset before my next try.

from deepdanbooru.

koke2c95 avatar koke2c95 commented on August 27, 2024

@ydk-tellurion you may can checkout this

code&datatsets prepare release: ShuhongChen/bizarre-pose-estimator

resnet50 trained on this subset classification better to RF5's

proposes a cleaner danbooru multi-label task specific on anime character, processing guide for danbooru2019
uninformative target tags: clean up by (positive tags, under tagged ,non-contextual relevance etc.)
severe class imbalance: weighted trick, to reduced class imbalance/long-tailed issues
on training : data augmentation strategy avoid certain lossly
solve related image tasks using this backbone feature.

from deepdanbooru.

tellurion-kanata avatar tellurion-kanata commented on August 27, 2024

@YHJ2c95 Hi, thanks for your information!
I would test it in the following days.

from deepdanbooru.

koke2c95 avatar koke2c95 commented on August 27, 2024

@KichangKim @ydk-tellurion

hi

after survey danbooru's tag
I think multi-label classification not a good

tag self with semantic, but is for human, as dataset is images bucket/collection

Concepts that one cannot describe / not presented , this serious effect, lead poorly trained models, few downstream task
Or even, nothing learned at all
perhaps add some pseudo label from unsupervised cluster could give huge improved

There are tags for non-original characters that are bias, i.e. character traits
There are very few valid tags, (so why limit dan to 2, there are many "images" and two tags are enough to search)
There is no information about the components of the tag, for example
The tag set is not strongly delineated and there is repetition of meaning
New tags are difficult to synchronise with earlier images

web image classification

I think the danbooru tagging task is a web image classification, which is very different from imagenet/CAPTCHA.
One is that imagenet/CAPTCHA is very close to objects, whereas web images are not like that
Secondly, if you remove the tagging literally, then the whole dataset should be just a bucket/collection, with sets of different themes of different intensities

Extend this further as a group, which can be subsetted and merged
Apply contrast learning again, instead of using single image view's contrast, turn it into a group-to-group contrast

The inspiration is ( give a bunch of images and then guess the tag game ) and paper

from deepdanbooru.

KichangKim avatar KichangKim commented on August 27, 2024

Yes, simple multi-label classification has many limitations for semantic recognition. That is why I removed "copyright tags" from training data.

from deepdanbooru.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.