I am trying to loop through a dataframe with reviews. Some of which seem to be below t

Ok I’ll take a look to make this fail more gracefully <span class=

Graceful handling of errors in vectorised inputs about googlelanguager HOT 9 CLOSED

ropensci commented on August 11, 2024

Graceful handling of errors in vectorised inputs

from googlelanguager.

Comments (9)

thisisnickb commented on August 11, 2024 1

Fixed - many thanks!

from googlelanguager.

MarkEdmondson1234 commented on August 11, 2024

Did you try it with sending in the column of text as is? It is vectorised so should cope with it, and a tryCatch() in the function should handle errors gracefully. If not let me know - so please try this code and report back what it does:

results <- gl_nlp(df_filtered$review_text)

from googlelanguager.

engti commented on August 11, 2024

Thanks Mark for the quick response.

I tried it, but upon getting an error, it exits rather than proceeding gracefully. I did manage to get it working though, by using only rows with 20 words in them, and converting all text to UTF 8. Though it was a fiddly process. Let me know if I should close this comment, or you'd like to know more.

2019-02-27 20:18:33 -- annotateText: 14 characters
Auto-refreshing stale OAuth token.
Request failed [400]. Retrying in 1 seconds...
Request failed [400]. Retrying in 2.5 seconds...
2019-02-27 20:18:41> Request Status Code: 400
Scopes: https://www.googleapis.com/auth/cloud-language https://www.googleapis.com/auth/cloud-platform
Method: service_json
Error: API returned: Invalid text content: too few tokens (words) to process.

from googlelanguager.

MarkEdmondson1234 commented on August 11, 2024

Ok good to know thanks - I will keep issue open to make the fails more graceful.

from googlelanguager.

engti commented on August 11, 2024

Many Thanks Mark. Let me know if you need me to test anything in the future.

from googlelanguager.

thisisnickb commented on August 11, 2024

Would just like to add that I am having the same issue, and that, unless I have my tryCatch() loop coded incorrectly, I'm also getting the same sort of failure:

This code:

#Use just instances with more than 25 words of text (arbitrary cutoff)
filelist<-lapply(filelist, function(x) subset(x, WordCount>24))

####Push the data up to Google and get the results back####
#Create the storage dataframe
output<-rep(list(NA), length(ids))
names(output)<-as.numeric(ids)

#Run the data through
tryCatch(
  {
    for(i in 1:length(ids)){
      output[[i]]<-gl_nlp(as.character(filelist[[i]]$Content))
    }
  }
)

ultimately produces this error:

from googlelanguager.

MarkEdmondson1234 commented on August 11, 2024

Ok I’ll take a look to make this fail more gracefully

…

________________________________ From: thisisnickb <[email protected]> Sent: Tuesday, July 2, 2019 8:19 PM To: ropensci/googleLanguageR Cc: Mark; Comment Subject: Re: [ropensci/googleLanguageR] Graceful handling of errors in vectorised inputs (#55) Would just like to add that I am having the same issue, and that, unless I have my tryCatch() loop coded incorrectly, I'm also getting the same sort of failure: This code:

________________________________ #Use just instances with more than 25 words of text (arbitrary cutoff) filelist<-lapply(filelist, function(x) subset(x, WordCount>24)) ####Push the data up to Google and get the results back#### #Create the storage dataframe output<-rep(list(NA), length(ids)) names(output)<-as.numeric(ids) #Run the data through tryCatch( { for(i in 1:length(ids)){ output[[i]]<-gl_nlp(as.character(filelist[[i]]$Content)) } } )

________________________________ ultimately produces this error: [unnamed]<https://user-images.githubusercontent.com/35079605/60536503-64d21480-9cd4-11e9-9732-c63231a911ff.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#55?email_source=notifications&email_token=AAYCPLHTF5JAGTQKTO743GDP5OL3PA5CNFSM4G2QP762YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZCEKTY#issuecomment-507790671>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAYCPLBIIYG2HVZITCYEUNDP5OL3PANCNFSM4G2QP76Q>.

from googlelanguager.

MarkEdmondson1234 commented on August 11, 2024

The above scenarios should be better now in version 0.2.0.9000 on Github now (install via remotes::install_github("ropensci/googleLanguageR"))

47c0666

For example, the below calls will carry on if there are 400 errors in the first responses:

library(googleLanguageR)
gl_nlp(c("the rain in spain falls mainly on the plain", "err", "", NA))
2019-07-02 22:08:00 -- annotateText: 43 characters
2019-07-02 22:08:01> Request Status Code: 400
2019-07-02 22:08:01 -- Error processing string: 'the rain in spain falls mainly on the plain' API returned: Invalid text content: too few tokens (words) to process.
2019-07-02 22:08:01 -- annotateText: 3 characters
2019-07-02 22:08:02> Request Status Code: 400
2019-07-02 22:08:02 -- Error processing string: 'err' API returned: Invalid text content: too few tokens (words) to process.

Which gives a response like below:

$sentences
$sentences[[1]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$sentences[[2]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$sentences[[3]]
[1] "#error - zero length string"

$sentences[[4]]
[1] "#error - zero length string"


$tokens
$tokens[[1]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$tokens[[2]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$tokens[[3]]
[1] "#error - zero length string"

$tokens[[4]]
[1] "#error - zero length string"


$entities
$entities[[1]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$entities[[2]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$entities[[3]]
[1] "#error - zero length string"

$entities[[4]]
[1] "#error - zero length string"


$language
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."
[2] "#error -  API returned: Invalid text content: too few tokens (words) to process."
[3] "#error - zero length string"                                                     
[4] "#error - zero length string"                                                     

$text
[1] "the rain in spain falls mainly on the plain"
[2] "err"                                        
[3] ""                                           
[4] NA                                           

$documentSentiment
# A tibble: 4 x 2
  magnitude score
      <dbl> <dbl>
1        NA    NA
2        NA    NA
3        NA    NA
4        NA    NA

$classifyText
# A tibble: 4 x 2
  name  confidence
  <chr>      <int>
1 NA            NA
2 NA            NA
3 NA            NA
4 NA            NA

Note you do not need to loop through indexes etc. to pass multiple text to the API, send in the vector and it will do one API call per text element. It will skip API calls for empty strings or NA vector elements.

from googlelanguager.

MarkEdmondson1234 commented on August 11, 2024

One thing I have just realised, is that the "too few tokens (words) to process." error only occurs if you include classifyText in the request e.g. if you use the annotateText default that includes all methods. You can get entity analysis for any number of characters if you specify only that

e.g.

gl_nlp(c("the rain in spain falls mainly on the plain", "err", "", NA), nlp_type = "analyzeEntities")

See https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/classifyText

from googlelanguager.

Graceful handling of errors in vectorised inputs about googlelanguager HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent