Maybe it overlaps other packages a bit, but would love the peer review <a href="ht

Submitted <a class="issue-link js-issue-link" data-error-text="Failed to load title" d

rOpenSci? about googlelanguager HOT 2 CLOSED

ropensci commented on August 11, 2024

rOpenSci?

from googlelanguager.

Comments (2)

MarkEdmondson1234 commented on August 11, 2024

Submitted ropensci/software-review#127

from googlelanguager.

MarkEdmondson1234 commented on August 11, 2024

Review notes

the docs should be explicit that you can only specify a single string, if that's the requirement (they say "character vector");
the function should validate its input before logging a message about the query;

Done, just needed to move validation up in the code.

I question the value of the logging in this case, or at least the default log level--it seems like it is just echoing my command;

When looping over a lot of NLP entries I found this useful to keep track of how far it got in the logs.

I'd encourage supporting a vectorized input, even if it means you have to make N requests internally (though I'd hope that the API has a way of doing at least some of the queries more intelligently)

https://github.com/MarkEdmondson1234/googleLanguageR/issues/9

It would really help if the docs described the contents of the object returned and how to interpret them, beyond saying "a list".

https://github.com/MarkEdmondson1234/googleLanguageR/issues/10

Improve gl_language_detect I/O

https://github.com/MarkEdmondson1234/googleLanguageR/issues/11

And I think I favour erroring and forcing the user to send a shorter request rather than automatically truncating because of the spectre of rate limiting--I'd rather have the say on what requests are going through rather than having the package truncate my text potentially mid-word and burning through some of my daily character/request limit. Alternatively, if you get a vector input and can chunk the requests cleanly by vector elements (first 2 elements in one request, next 5 in the next, etc.) and then return a single object, perhaps it would be nicer--that way, you're concealing the constraints of the API from the R user, who shouldn't have to think about URL character lengths. Regardless, all functions should handle input length issues consistently.

gl_speech_recognise

https://github.com/MarkEdmondson1234/googleLanguageR/issues/12

You may also want to consider using the httptest package to help you have cleaner API fixtures that don't include your auth token. httptest can also help you to get to 100% line coverage easily: you can assert that the right requests are made with the various API versions and "gs://" URLs (things that are among your untested lines currently) without needing to actually make all of those requests or maintain mocks for their responses. See the package vignette for some examples.
Even where you do have line coverage, the tests are thin, particularly in showing what the API response object looks like. This is particularly important because, as I mentioned above, the man pages also don't give much guidance on what the API responses look like. As a user, I'd like to have some kind of reference for what to expect that the functions return, and with respect to tests, some kind of assurance of the data contract. While the man pages ideally should describe the responses in more detail, you can also use tests to give more detailed examples that show the shape of a return object, possibly more naturally than you can in the @examples section of the docs.

You have non-ASCII text in test files, which if I recall correctly from a previous CRAN rejection of mine, will cause problems when they run your tests on Solaris. You can resolve this by moving the non-ASCII text to a separate file that gets sourced or otherwise read in and place that after a skip_on_cran. Just placing it in a test-file after a skip_on_cran isn't sufficient because the test file will fail to source. Cf. Crunch-io/rcrunch@7f74971

https://github.com/MarkEdmondson1234/googleLanguageR/issues/13

You have non-ASCII text in test files, which if I recall correctly from a previous CRAN rejection of mine, will cause problems when they run your tests on Solaris. You can resolve this by moving the non-ASCII text to a separate file that gets sourced or otherwise read in and place that after a skip_on_cran. Just placing it in a test-file after a skip_on_cran isn't sufficient because the test file will fail to source. Cf. Crunch-io/rcrunch@7f74971

https://github.com/MarkEdmondson1234/googleLanguageR/issues/14

API Usage

These might point more to the capabilities of googleAuthR than this package.

Sys.sleep(getOption("googleLanguageR.rate_limit")) seems inefficient. You're adding 500ms to every request unnecessarily. Presumably the Google API will tell you if you've exceeded your limit by responding with a 429 response code, so instead, your API client could handle a 429 by sleeping and retrying, and otherwise proceed normally. That way, you only pay the penalty if you exceed the limit, not on every request. Let the server decide your rate limiting. There are examples out there of other packages that do this (twitteR and crunch are two examples I'm aware of).
It also seems that the operative logic in check_rate isn't exercised in tests.
I wonder if you could simplify some query param logic, as in the "translate" functions, if gar_api_generator handled it (rather than concatenating the query strings to the URL yourself, could delegate to httr).

Style/minutiae

Function imports: using importFrom rather than :: would make for more readable code IMO, and according to http://r-pkgs.had.co.nz/namespace.html#imports, it is marginally faster.
Consider defining true <- unbox(TRUE) to avoid repetition--that appears a lot. Consider also that auto_unbox=TRUE perhaps should be the default in your API client's JSON serializer and use I() to prevent unboxing on attributes that must always be arrays.
getOption takes a 'default' arg so https://github.com/MarkEdmondson1234/googleLanguageR/blob/master/R/utilities.R#L28-L29 can be condensed
Consider adding "docstring" comments or similar for internal functions that explain why they exist and what they do. Some have useful comments like that but others like myMessage would benefit from them.
I noticed what appeared to be a couple of superfluous sprintf/paste0 with single strings/no interpolation, e.g. https://github.com/MarkEdmondson1234/googleLanguageR/blob/master/R/translate.R#L24
Is it really necessary to assert_that on your inputs on internal functions? While the cost of doing so is negligible when you're inside functions that make HTTP requests (network latency (and Sys.sleep(getOption("googleLanguageR.rate_limit"))) will dominate the running time), but it seems like overkill since you control the functions that are passing inputs to them.

from googlelanguager.

rOpenSci? about googlelanguager HOT 2 CLOSED

Comments (2)

Review notes

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent