Giter VIP home page Giter VIP logo

Comments (9)

thisisnickb avatar thisisnickb commented on August 11, 2024 1

Fixed - many thanks!

from googlelanguager.

MarkEdmondson1234 avatar MarkEdmondson1234 commented on August 11, 2024

Did you try it with sending in the column of text as is? It is vectorised so should cope with it, and a tryCatch() in the function should handle errors gracefully. If not let me know - so please try this code and report back what it does:

results <- gl_nlp(df_filtered$review_text)

from googlelanguager.

engti avatar engti commented on August 11, 2024

Thanks Mark for the quick response.

I tried it, but upon getting an error, it exits rather than proceeding gracefully. I did manage to get it working though, by using only rows with 20 words in them, and converting all text to UTF 8. Though it was a fiddly process. Let me know if I should close this comment, or you'd like to know more.

2019-02-27 20:18:33 -- annotateText: 14 characters
Auto-refreshing stale OAuth token.
Request failed [400]. Retrying in 1 seconds...
Request failed [400]. Retrying in 2.5 seconds...
2019-02-27 20:18:41> Request Status Code: 400
Scopes: https://www.googleapis.com/auth/cloud-language https://www.googleapis.com/auth/cloud-platform
Method: service_json
Error: API returned: Invalid text content: too few tokens (words) to process.

from googlelanguager.

MarkEdmondson1234 avatar MarkEdmondson1234 commented on August 11, 2024

Ok good to know thanks - I will keep issue open to make the fails more graceful.

from googlelanguager.

engti avatar engti commented on August 11, 2024

Many Thanks Mark. Let me know if you need me to test anything in the future.

from googlelanguager.

thisisnickb avatar thisisnickb commented on August 11, 2024

Would just like to add that I am having the same issue, and that, unless I have my tryCatch() loop coded incorrectly, I'm also getting the same sort of failure:

This code:

#Use just instances with more than 25 words of text (arbitrary cutoff)
filelist<-lapply(filelist, function(x) subset(x, WordCount>24))

####Push the data up to Google and get the results back####
#Create the storage dataframe
output<-rep(list(NA), length(ids))
names(output)<-as.numeric(ids)

#Run the data through
tryCatch(
  {
    for(i in 1:length(ids)){
      output[[i]]<-gl_nlp(as.character(filelist[[i]]$Content))
    }
  }
)

ultimately produces this error:

unnamed

from googlelanguager.

MarkEdmondson1234 avatar MarkEdmondson1234 commented on August 11, 2024

from googlelanguager.

MarkEdmondson1234 avatar MarkEdmondson1234 commented on August 11, 2024

The above scenarios should be better now in version 0.2.0.9000 on Github now (install via remotes::install_github("ropensci/googleLanguageR"))

47c0666

For example, the below calls will carry on if there are 400 errors in the first responses:

library(googleLanguageR)
gl_nlp(c("the rain in spain falls mainly on the plain", "err", "", NA))
2019-07-02 22:08:00 -- annotateText: 43 characters
2019-07-02 22:08:01> Request Status Code: 400
2019-07-02 22:08:01 -- Error processing string: 'the rain in spain falls mainly on the plain' API returned: Invalid text content: too few tokens (words) to process.
2019-07-02 22:08:01 -- annotateText: 3 characters
2019-07-02 22:08:02> Request Status Code: 400
2019-07-02 22:08:02 -- Error processing string: 'err' API returned: Invalid text content: too few tokens (words) to process.

Which gives a response like below:

$sentences
$sentences[[1]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$sentences[[2]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$sentences[[3]]
[1] "#error - zero length string"

$sentences[[4]]
[1] "#error - zero length string"


$tokens
$tokens[[1]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$tokens[[2]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$tokens[[3]]
[1] "#error - zero length string"

$tokens[[4]]
[1] "#error - zero length string"


$entities
$entities[[1]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$entities[[2]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$entities[[3]]
[1] "#error - zero length string"

$entities[[4]]
[1] "#error - zero length string"


$language
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."
[2] "#error -  API returned: Invalid text content: too few tokens (words) to process."
[3] "#error - zero length string"                                                     
[4] "#error - zero length string"                                                     

$text
[1] "the rain in spain falls mainly on the plain"
[2] "err"                                        
[3] ""                                           
[4] NA                                           

$documentSentiment
# A tibble: 4 x 2
  magnitude score
      <dbl> <dbl>
1        NA    NA
2        NA    NA
3        NA    NA
4        NA    NA

$classifyText
# A tibble: 4 x 2
  name  confidence
  <chr>      <int>
1 NA            NA
2 NA            NA
3 NA            NA
4 NA            NA

Note you do not need to loop through indexes etc. to pass multiple text to the API, send in the vector and it will do one API call per text element. It will skip API calls for empty strings or NA vector elements.

from googlelanguager.

MarkEdmondson1234 avatar MarkEdmondson1234 commented on August 11, 2024

One thing I have just realised, is that the "too few tokens (words) to process." error only occurs if you include classifyText in the request e.g. if you use the annotateText default that includes all methods. You can get entity analysis for any number of characters if you specify only that

e.g.

gl_nlp(c("the rain in spain falls mainly on the plain", "err", "", NA), nlp_type = "analyzeEntities")

See https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/classifyText

from googlelanguager.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.