Hi, i tried the demo with 2012 Debate - Barack Obama (EN) (body of text). I then copy

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a href="https://github.com/watson-developer-cloud/personality-insights-nodejs/files/7

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Different results on Demo and via API calls about personality-insights-nodejs HOT 13 CLOSED

garyhow01 commented on June 23, 2024

Different results on Demo and via API calls

from personality-insights-nodejs.

Comments (13)

germanattanasio commented on June 23, 2024 1

@garyhow01 Which version of the API were you using? The demo uses v2 (we are in the process of updating it to v3).

from personality-insights-nodejs.

neil-boyette-ibm commented on June 23, 2024 1

Hi Gary,

Here are the curl commands we used with the txt file above.

V2:

curl -X POST -H "Content-Type: text/plain" -H "Content-Language: en" -H "Accept-Language: en" -H "Accept: application/json" -H "Authorization: Basic <id and pwd>" -H "Cache-Control: no-cache" -d '@BarackObama2012Debate.txt' "https://gateway.watsonplatform.net/personality-insights/api/v2/profile"

V3:

curl -X POST -H "Content-Type: text/plain" -H "Content-Language: en" -H "Accept-Language: en" -H "Accept: application/json" -H "Authorization: Basic <id and pwd>" -H "Cache-Control: no-cache" -d '@BarackObama2012Debate.txt' "https://gateway.watsonplatform.net/personality-insights/api/v3/profile?version=2016-08-26"

Both of these produce the same results (namely 6722 words). A few things you can check is to make sure you have the content type set correctly, and also that you didn't change any of the formatting in the text you were sending.

Another note I'd like to make is that the v2 and v3 APIs will produce the exact same output; the difference between them is in the format of the API (mostly the response), not in how the profile is generated (they both go through the same pipeline).
I'll also point out that the demo is actually hooked up to a live instance of the service, so it will also perform the exact same logic as a standalone API call. The main chance for differences is if the calls are not the same, i.e. differences in formatting, content types etc.

from personality-insights-nodejs.

neil-boyette-ibm commented on June 23, 2024 1

Sorry, one clarification as I tried out the demo and found that there was indeed a difference being shown versus the curl.

In short, the demo was showing 7020 words, and curl only 6722 words. This is because of the way that curl was sending the text file. The text in the demo has hard line breaks in the middle of sentences for formatting. These were preserved in the way the demo was sending the request, however, when curl sends the text it removes the new lines. This means that in many cases the last word on a line is concatenated with the first word on the next line. When processing this text, it removes about 300 words from the corpus being processed.

Attached is a copy of the text that has the mid-sentence line breaks removed (it is formatted like proper text). When you run this in the API, you'll get the same 7K words as in the demo.

BarackObama2012Debate.txt

from personality-insights-nodejs.

neil-boyette-ibm commented on June 23, 2024 1

Great.

Yes, the model is constantly being improved and when we see a good enough increase in accuracy we will roll it out. These improvements are done to the PI back-end and will be available through both the v2 and v3 APIs.

from personality-insights-nodejs.

neil-boyette-ibm commented on June 23, 2024 1

In general the service does not directly consider spaces, line breaks, paragraphs etc, we are only looking at the text. The only time, this type of formatting comes into play is if it affects the actual words as detected using our word tokenizer (like in the case above where the lack of a space was causing the service to interpret two words as one).
The exception to this is in Japanese, as that language is not space delimited and thus the word tokenization is a lot more complex than in western languages. There having a single line break versus a double line break is significant (as a single line break is used for formatting and a double to indicate a new "sentence").

from personality-insights-nodejs.

garyhow01 commented on June 23, 2024

I was using v3..

from personality-insights-nodejs.

FMGordillo commented on June 23, 2024

Also guys, don't forget to upload D3 as well! It's on V4 now
Thank you for your effort

from personality-insights-nodejs.

vibhasinghal commented on June 23, 2024

@garyhow01 - can you please also provide the text you copied. What I am surprised with is the fact that your analysis report says that 11633 words analyzed. However, the text only has ~7000 words.

from personality-insights-nodejs.

garyhow01 commented on June 23, 2024

BarackObama2012Debate.txt

@vibhasinghal here you go. It's also available in the PI demo itself. I am still pondering why a change from v2 to v3 could cause a big change in "Openness" score for Obama.

from personality-insights-nodejs.

vibhasinghal commented on June 23, 2024

@garyhow01 - I tried this file using v2 and v3. I get same results. Words analyzed = 6722. Openness comes to 92.69.

I used curl to make the request. How did you make the call ? Can you redo and let me know if you get same result.

from personality-insights-nodejs.

garyhow01 commented on June 23, 2024

@vibhasinghal i see why. i have copied the text directly from the demo to my application rich text editor and it padded the text with html tags. Have tested, the result is quite close. but still the number of words counted is different from what the demo showed at 7020. Did you format the text before making the request? e.g. the first sentence
"...University<br> of Denver for your hospitality."

from personality-insights-nodejs.

garyhow01 commented on June 23, 2024

thanks, tried and got the expected result. May not be most appropriate to ask here but please do let me know if i can post this question else where:
Regarding this statement "..that the v2 and v3 APIs will produce the exact same output.." Does the model improves over time either by accumulating more survey results/validation studies or thru self-learning mechanism to be more accurate?

from personality-insights-nodejs.

garyhow01 commented on June 23, 2024

Hi Neil/ all, would like to ask - Does the model consider number of line-breaks, number of spaces, special characters. e.g. more line breaks, paragraphing = lower trait_X score

from personality-insights-nodejs.

Different results on Demo and via API calls about personality-insights-nodejs HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent