Giter VIP home page Giter VIP logo

Comments (8)

kognate avatar kognate commented on June 29, 2024 2

The situation with the watson logo being scored can be corrected in a custom classifier by adding the image to the negative class.

So, lets say I have 200 images of documents I want to classify for each document type, like drivers license, passport, and tax id card. In addition to these three classes, I include 100 images of logos and things that are not any of the three classes I want to find. When testing my classifier, if I find images that classify incorrectly, I can add them to the negative classifier to improve the results.

The reason the watson logo gets classified is that it must have some feature in it that can be found in the dalmatian class. If I had to guess, it's the dots in the image. I would also say that the 0.522 score is pretty low, and increasing the threshold will help weed out poorly classified images.

from visual-recognition-nodejs.

kognate avatar kognate commented on June 29, 2024

The watson image used was from https://i.ytimg.com/vi/8o44asJt8ZA/maxresdefault.jpg

from visual-recognition-nodejs.

jflevi avatar jflevi commented on June 29, 2024

Yes... Any image which is not a dog provides similar results.
Thanks
JF

from visual-recognition-nodejs.

nfriedly avatar nfriedly commented on June 29, 2024

Hi @jflevi, thanks for the feedback.

This is due to the relatively small training image set that the demo uses. In particular, the Non-Dogs are all 4-legged critters like cats and tigers and such, so when you include that in the training data, the service tries to match whatever random image you give it to either a particular dog breed or else cats and such.

If you wanted to recognize arbitrary images, you’re going to need either create a much larger training set (with the Non-Dogs part full of random things like company logos), or else you could just use the default classifier that the "Try” page of the demo uses.

That said, we do appreciate the feedback and we’re continually working on improving the service, so we’ll take this into account for future updates.

from visual-recognition-nodejs.

jflevi avatar jflevi commented on June 29, 2024

Nathan,

Thanks for your response but I don't think adding more images will help in
my use-case which is the following (adapted to dogs).

I have a set of images with unknown content. I want to identify which one
are Dogs which are not Dogs.

The real use-case I'm trying to address with this service is: The Bank as
a set of customer documents (70 millions documents) stored and they want
to classify and identify ID documents (passport or IDCard) other document
don't need to be classified. The default classifier is not able to
recognize any ID cards.

Would you have any recommendation to address my requirements?

Thanks

Cordialement - Kind regards

Jean-Francois LEVI
Client Technical Advisor - Société Générale
Phone/Fax: +(33) 1 58 75 28 77
Mobile: +(33) 6 75 07 85 00
Email: [email protected]

From: Nathan Friedly [email protected]
To: watson-developer-cloud/visual-recognition-nodejs
[email protected]
Cc: Jean-Francois Levi/France/IBM@IBMFR, Mention
[email protected]
Date: 02/06/2016 22:05
Subject: Re: [watson-developer-cloud/visual-recognition-nodejs]
Strange behavior of Demo Classifier (#104)

Hi @jflevi, thanks for the feedback.
This is due to the relatively small training image set that the demo uses.
In particular, the Non-Dogs are all 4-legged critters like cats and tigers
and such, so when you include that in the training data, the service tries
to match whatever random image you give it to either a particular dog
breed or else cats and such.
If you wanted to recognize arbitrary images, you?re going to need either
create a much larger training set (with the Non-Dogs part full of random
things like company logos), or else you could just use the default
classifier that the "Try? page of the demo uses.
That said, we do appreciate the feedback and we?re continually working on
improving the service, so we?ll take this into account for future updates.
?
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

from visual-recognition-nodejs.

nfriedly avatar nfriedly commented on June 29, 2024

That actually sounds like a very fitting use case - choose a random selection of images out of the 7 million and split them into two groups: ID cards & other. Per the documentation, you'll want at least 150-200 images in each group, and should see some benefit all the way up to 5000 images total. (You'll likely have to shrink down the images to fit within the 100mb-per-zip limit, but 320px and larger is good.)

Beyond that, you can also require human verification for images with a score below a given threshold, say 0.75 - and then perhaps add those images to the training set so that it further improves over time. (note: each classifier instance is immutable, so you can't add new training images to an existing one... But you can just replace it with a new one.)

from visual-recognition-nodejs.

jflevi avatar jflevi commented on June 29, 2024

Nathan,

Thanks this is exactly what's I'm currently testing and it works
fine...except that images which are not IDcards (like logos) are sometime
being classified as IDcards. So I tested with the Dogs example and found
the same problem. For me there is a bug somewhere... try the following
with Dogs...

Select 3 dogs breeds only and try to classify the Watson logo or any other
image... The result is no match which is fine this is the expected result.

If you Select all the Watson logo is being wrongly classified.

That's my point and I don't understand why it behaves like this.

Thanks a lot for your feedback.

JF

Cordialement - Kind regards

Jean-Francois LEVI
Client Technical Advisor - Société Générale
Phone/Fax: +(33) 1 58 75 28 77
Mobile: +(33) 6 75 07 85 00
Email: [email protected]

From: Nathan Friedly [email protected]
To: watson-developer-cloud/visual-recognition-nodejs
[email protected]
Cc: Jean-Francois Levi/France/IBM@IBMFR, Mention
[email protected]
Date: 03/06/2016 14:44
Subject: Re: [watson-developer-cloud/visual-recognition-nodejs]
Strange behavior of Demo Classifier (#104)

That actually sounds like a very fitting use case - choose a random
selection of images out of the 7 million and split them into two groups:
ID cards & other. Per the documentation, you'll want at least 150-200
images in each group, and should see some benefit all the way up to 5000
images total. (You'll likely have to shrink down the images to fit within
the 100mb-per-zip limit, but 320px and larger is good.)
Beyond that, you can also require human verification for images with a
score below a given threshold, say 0.75 - and then perhaps add those
images to the training set and created a new classifier so that it further
improves over time. (Each custom classifier is immutable, so you can't add
new training images to an existing one... But you can just replace it with
a new one.)
?
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

from visual-recognition-nodejs.

matt-ny avatar matt-ny commented on June 29, 2024

Jean-Francois wrote:

For me there is a bug somewhere... try the following
with Dogs...
Select 3 dogs breeds only and try to classify the Watson logo or any other
image... The result is no match which is fine this is the expected result.
If you Select all the Watson logo is being wrongly classified.
That's my point and I don't understand why it behaves like this.

I understand that is counter-intuitive. In your example, adding more training data (all dogs instead of just 3 breeds) leads to a misclassification. One of the deep problems with some machine learning techniques (including the ones we use) is that we cannot explain "why" a mistake (or right answer) was given. You might find this paper interesting and surprising: https://arxiv.org/abs/1312.6199

We do know that counter-intuitive results like this are possible, especially when the test images (Watson logo) come from a different distribution than the training images (dogs). We are actively working on ways to identify whether a classifier is "appropriate" for a particular test set, given what it was trained on. Our dept has some initial results, but nothing deployed yet.

@kognate 's advice above, about using large training sets (hundreds or thousands of example per class) and augmenting your negative set with the logo images is the best practice we can recommend at this time.

Matt
Sr Software Engineer
IBM Research - Visual Recognition

from visual-recognition-nodejs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.