Comments (9)
Fixed - many thanks!
from googlelanguager.
Did you try it with sending in the column of text as is? It is vectorised so should cope with it, and a tryCatch()
in the function should handle errors gracefully. If not let me know - so please try this code and report back what it does:
results <- gl_nlp(df_filtered$review_text)
from googlelanguager.
Thanks Mark for the quick response.
I tried it, but upon getting an error, it exits rather than proceeding gracefully. I did manage to get it working though, by using only rows with 20 words in them, and converting all text to UTF 8. Though it was a fiddly process. Let me know if I should close this comment, or you'd like to know more.
2019-02-27 20:18:33 -- annotateText: 14 characters
Auto-refreshing stale OAuth token.
Request failed [400]. Retrying in 1 seconds...
Request failed [400]. Retrying in 2.5 seconds...
2019-02-27 20:18:41> Request Status Code: 400
Scopes: https://www.googleapis.com/auth/cloud-language https://www.googleapis.com/auth/cloud-platform
Method: service_json
Error: API returned: Invalid text content: too few tokens (words) to process.
from googlelanguager.
Ok good to know thanks - I will keep issue open to make the fails more graceful.
from googlelanguager.
Many Thanks Mark. Let me know if you need me to test anything in the future.
from googlelanguager.
Would just like to add that I am having the same issue, and that, unless I have my tryCatch() loop coded incorrectly, I'm also getting the same sort of failure:
This code:
#Use just instances with more than 25 words of text (arbitrary cutoff)
filelist<-lapply(filelist, function(x) subset(x, WordCount>24))
####Push the data up to Google and get the results back####
#Create the storage dataframe
output<-rep(list(NA), length(ids))
names(output)<-as.numeric(ids)
#Run the data through
tryCatch(
{
for(i in 1:length(ids)){
output[[i]]<-gl_nlp(as.character(filelist[[i]]$Content))
}
}
)
ultimately produces this error:
from googlelanguager.
from googlelanguager.
The above scenarios should be better now in version 0.2.0.9000 on Github now (install via remotes::install_github("ropensci/googleLanguageR")
)
For example, the below calls will carry on if there are 400 errors in the first responses:
library(googleLanguageR)
gl_nlp(c("the rain in spain falls mainly on the plain", "err", "", NA))
2019-07-02 22:08:00 -- annotateText: 43 characters
2019-07-02 22:08:01> Request Status Code: 400
2019-07-02 22:08:01 -- Error processing string: 'the rain in spain falls mainly on the plain' API returned: Invalid text content: too few tokens (words) to process.
2019-07-02 22:08:01 -- annotateText: 3 characters
2019-07-02 22:08:02> Request Status Code: 400
2019-07-02 22:08:02 -- Error processing string: 'err' API returned: Invalid text content: too few tokens (words) to process.
Which gives a response like below:
$sentences
$sentences[[1]]
[1] "#error - API returned: Invalid text content: too few tokens (words) to process."
$sentences[[2]]
[1] "#error - API returned: Invalid text content: too few tokens (words) to process."
$sentences[[3]]
[1] "#error - zero length string"
$sentences[[4]]
[1] "#error - zero length string"
$tokens
$tokens[[1]]
[1] "#error - API returned: Invalid text content: too few tokens (words) to process."
$tokens[[2]]
[1] "#error - API returned: Invalid text content: too few tokens (words) to process."
$tokens[[3]]
[1] "#error - zero length string"
$tokens[[4]]
[1] "#error - zero length string"
$entities
$entities[[1]]
[1] "#error - API returned: Invalid text content: too few tokens (words) to process."
$entities[[2]]
[1] "#error - API returned: Invalid text content: too few tokens (words) to process."
$entities[[3]]
[1] "#error - zero length string"
$entities[[4]]
[1] "#error - zero length string"
$language
[1] "#error - API returned: Invalid text content: too few tokens (words) to process."
[2] "#error - API returned: Invalid text content: too few tokens (words) to process."
[3] "#error - zero length string"
[4] "#error - zero length string"
$text
[1] "the rain in spain falls mainly on the plain"
[2] "err"
[3] ""
[4] NA
$documentSentiment
# A tibble: 4 x 2
magnitude score
<dbl> <dbl>
1 NA NA
2 NA NA
3 NA NA
4 NA NA
$classifyText
# A tibble: 4 x 2
name confidence
<chr> <int>
1 NA NA
2 NA NA
3 NA NA
4 NA NA
Note you do not need to loop through indexes etc. to pass multiple text to the API, send in the vector and it will do one API call per text element. It will skip API calls for empty strings or NA vector elements.
from googlelanguager.
One thing I have just realised, is that the "too few tokens (words) to process." error only occurs if you include classifyText in the request e.g. if you use the annotateText default that includes all methods. You can get entity analysis for any number of characters if you specify only that
e.g.
gl_nlp(c("the rain in spain falls mainly on the plain", "err", "", NA), nlp_type = "analyzeEntities")
See https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/classifyText
from googlelanguager.
Related Issues (20)
- Error with speaker diarization HOT 7
- I am getting this error on passing a wav file into readwave function. HOT 1
- Support SSML for text-to-speech HOT 2
- Support device profiles in text to speech
- Authenticated website examples
- Step #1: API returned: Invalid JSON payload received. Unknown name "enableSpeakerDiarization" at 'config': Cannot find field. HOT 1
- Package has a VignetteBuilder field but no prebuilt vignette index.
- Possible issue with asynch call? HOT 2
- gl_speech request almost always times out, no proper error message HOT 18
- no access to Google Cloud Service HOT 2
- Entity sentiment shows but document sentiment shows NA HOT 4
- googleLanguageR does not translate tweets. HOT 8
- lack of MP3 encoding HOT 1
- Error midway and no translated text output HOT 3
- Call for co-maintainers :-) HOT 3
- Split calls in gl_translate more effectively - not all or nothing? HOT 2
- Error: lexical error: invalid char in json text. HOT 1
- Link to package webpage broken - gives 404 on markedmonson.me ...
- Add support to translate files HOT 7
- gl_talk: language code detection HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from googlelanguager.