rexyai / restrserve Goto Github PK
View Code? Open in Web Editor NEWR web API framework for building high-performance microservices and app backends
Home Page: https://restrserve.org
R web API framework for building high-performance microservices and app backends
Home Page: https://restrserve.org
See s-u/Rserve#99
By default Rserve use 6331 port (qap). So we can't start multiple application due 6311 will be busy by the first started application.
We should check port and generate another random port and check it.
Thanks for this useful package! I ran into an error from running the code in the README. Apparently, the method new()
for Logger
does not take a file
argument:
library(RestRserve)
logger = Logger$new(level = TRACE, file = "")
#> Error in .subset2(public_bind_env, "initialize")(...): unused argument (file = "")
# ...
Created on 2019-11-11 by the reprex package (v0.3.0)
Think about serving static files API
lgr
now updated on CRAN on relatively stable.
Also we can reexport lgr
object to suggest developers use it in their apps.
mb it worth to completely remove openapi* functions and stick to just yaml files. I found myself never using these functions anymore.
Use jsonlite
package instead self to_json
implementation.
Reasons:
jsonlite
is well tested package)Check what ideas we can borrow:
I am trying to train a LightGBM, and then serve it using RestRServe. However after making a request to the microservice the process hangs and I receive no response. I also don't receive any error or warning messages.
Running top reveals that the forked process has been created and some memory has been allocated (In the actual workflow where I encountered the issue CPU usage by the forked process initially increases then drops to 0. In the example below CPU usage is negligible, so I couldn't track it).
Here is a minimal reproducible example:
library(lightgbm)
library(RestRserve)
data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
train <- agaricus.train
test <- agaricus.test
bst <- lightgbm(
data = train$data,
label = train$label,
num_leaves = 4,
learning_rate = 1,
nrounds = 2,
objective = "binary"
)
dummy_api_function <- function(request, response) {
result <- predict(bst, test$data)[1]
response$body <- jsonlite::toJSON(result)
response$content_type <- "application/json"
response$headers <- character(0)
response$status_code <- 200L
forward()
}
RestRserveApp <- RestRserve::RestRserveApplication$new()
RestRserveApp$add_post(path = "/api/dummy_api", FUN = dummy_api_function)
RestRserveApp$run(8001)
And an example of a curl request that leads to the process hanging:
curl --header "Content-Type: application/json" --request POST --data '{"foo":"bar"}' localhost:8001/api/dummy_api
(The same example but replacing result <- predict(bst, test$data)[1]
with result <- 1
produces a result, so the issue must be with the predict call).
It is hard to tell whether the issue relates more to RestRserve or to LightGBM, but any help tracking the cause is appreciated.
Environment info:
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=bg_BG.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=bg_BG.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=bg_BG.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=bg_BG.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Matrix_1.2-12 RestRserve_0.1.0.13 lightgbm_2.2.4 R6_2.4.0
loaded via a namespace (and not attached):
[1] compiler_3.4.4 magrittr_1.5 tools_3.4.4 yaml_2.2.0 Rserve_1.7-3 grid_3.4.4 data.table_1.12.0 jsonlite_1.6
[9] lattice_0.20-35
$ curl -I http://localhost:5001/api/hello
HTTP/1.1 404 Code 404
Content-type: application/json
Content-length: 30
$ curl -X GET -I http://localhost:5001/api/hello
HTTP/1.1 200 OK
Content-type: text/plain
Content-length: 10
I have been trying to get request headers with the following code:
ApiReadHeader <- function(request, response) {
response$body <- request$headers[["TenantID"]]
response$content_type <- "application/json"
response$headers <- character(0)
response$status_code <- 200L
forward()
}
RestRserveApp <- RestRserve::RestRserveApplication$new()
RestRserveApp$add_get(path = "/api/readheader", FUN = ApiReadHeader)
RestRserveApp$run(8001)
And the following curl request :
curl -X GET "http://localhost:8001/api/readheader" -H "TenantID: 5a8d27c232f69a38f467517d"
I expected the server to respond with 5a8d27c232f69a38f467517d
, however I got
Error in user code: subscript out of bounds
Call: request$headers[["TenantID"]]
Tracebeck:
FUN(request, response)
What am I doing wrong?
And what is the best way to debug RestRserveApplication?
I had trouble doing that so I tried returning request's headers, quaries, and bodies to be able to have a look at them and their structure.
I also tried: response$body <- str(request$headers)
however it returns and R error again.
Help is appreciated.
Thanks.
It would be nice add to Travis config auto generate pkgdown site.
After CRAN publication docs for the dev-version and stable on CRAN.
process_request
not catches some exceptions. For example raise
from the ContentHandlers$get_decode
. May be anywhere else.
As result Rserve raise it own HTTP status (500) and body (500 Evaluation error).
Would't it be nice to have:
backend <- BackendRserve$new()
backend$start(app, http_port = 1234, background = TRUE)
backend$stop()
?
Starting from R 3.6.0 we can use structured errors - errorCondition
. So middleware and handlers could easily raise errors which will be caught in a single place - RestRserveApplication
. This will greatly simplify and streamline logic and will allow to remove forward()
, interrupt()
calls.
Would be nice to have automatic swagger UI
in theory we should be able to have different backends - Rserve, httpuv, etc. In order to be prepared for that we need Request
and Response
classes should be backend independent. Which means:
to_rserve
and from_rserve
methods independent of Request
and Response
Request
from Rserve API should be also extracted from Request
classAdd to add_route
method the openapi_file
param with OpenAPI definition YAML file.
rename:
What else?
Check what ideas we can borrow:
get_decode
raises exception with multipart body. But body parsed and mapped with from_rserve
. Also multipart body not processed with Request
constructor (not parsed and mapped).
Now returned 404.
This topic is to share feedback about current dev version - documentation, API, etc.
This topic is NOT for questions - use stackoverflow instead and mark you question with restrserve tag
Needs IPC communication between Rserve child processes. Related to s-u/Rserve#105
We can source a script and parse comments before function definition.
Regex to parse is "^#['\\*]\\s*(@(\\w+)\\s+)?(.*)$"
.
Parse OpenAPI specs, method and path (template).
Application to dump body:
.http.request = function(path, query, body, headers) {
saveRDS(body, "body.rds")
list("OK!", "text/plain", character(0), 200L)
}
Rserve::run.Rserve(http.port = 8001)
Send request:
echo "Test" > /tmp/test
curl http://127.0.0.1:8001 -F 'param=value' -F 'file=@/tmp/test' -F 'var=text'
After dump body looks like this:
> rawToChar(body)
[1] "--------------------------1cd7cb588b327247\r\nContent-Disposition: form-data; name=\"param\"\r\n\r\nvalue\r\n--------------------------1cd7cb588b327247\r\nContent-Disposition: form-data; name=\"file\"; filename=\"test\"\r\nContent-Type: application/octet-stream\r\n\r\nTEXT\n\r\n--------------------------1cd7cb588b327247--\r\n"
> cat(rawToChar(body))
--------------------------1cd7cb588b327247
Content-Disposition: form-data; name="param"
value
--------------------------1cd7cb588b327247
Content-Disposition: form-data; name="file"; filename="test"
Content-Type: application/octet-stream
TEXT
--------------------------1cd7cb588b327247--
There is example of R implementation in the FastRWeb
package.
There is C++ implementation in the cpp-httplib
library.
On input: raw vector.
On output: list with the following structure.
list(
"param" = list(value = "value")
"file" = list(filename = "test", offset = int, length = int, content_type = "application/octet-stream"),
"var" = list(value = "text")
)
Line endings is \r\n
. After boundary string following the Content-Disposition
head and optional Content-Type
header. Value or file content separated with additional empty line.
Related RFC: https://tools.ietf.org/html/rfc7578
I think we should provide demos for the most common scenarios.
Suggestions are welcome.
lm
/glm
modelxgboost
modelcaret
modelAdd 'url' method or property in RestRserveRequest
class. Useful for logging and debugging.
I'm trying to run the example but I get these errors:
Error: object 'Application' not found
Error: object 'BackendRserve' not found
Then I saw that in the library there is no Application and BackendRserve function.
I have installed it both with GitHub link and from remotes::install_github("rexyai/RestRserve") but i get the same errors.
Hi all,
This is more an informational question. Is it possible/does it make sense to use this package inside a managed cluster, eg with Kubernetes? Since it appears that RestRServe can handle requests in parallel, and the point of K8s is to be able to scale to meet demand.
Thank you for writing such a nice utility for concurrent R service. I wanted to check if there is any plan for pushing it cran?
We should add some tests about:
Classes (constructor and methods):
RestRserveRequest
(added in 323db68)RestRserveResponse
(added in 323db68)HTTPErrorFactory
(added in 323db68)RestRserveApplication
RestRserveMiddleware
(added in 323db68)RestRserveRouter
(added in 10d619a)Functions:
raise
(added in 323db68)try_capture_stack
(added in #43)deparse_vector
(added in #43)get_traceback
(added in #43)URLenc
(added in #43)dict_
functions (removed in #43)parse_query
(added in d3ac6ef)parse_headers
(added in d3ac6ef)extract_docstrings_yaml
(added in ab19b6a)guess_mime
(added in #43)is_string
(added in #43)is_path
(added in #43)Running server:
For the moment openapi spec openapi.yaml
is generated on the fly. Worth to generate it once and serve from static file.
swagger
package contains obsolete swagger-ui version.
RestRserve
and remove swagger
dependencywrite_swagger_ui_index_html
: move index_html
to file in inst/
RestRserveApplication$add_swagger_ui
to use internal swagger fileshttp and tcp forwarding
Need to figure out solution to add simple CORS - see discussion here falconry/falcon#1220
Не увеерн, что такое поведение корректно.
curl 'https://localhost:5000/api/rating?clientid=123456&step
Error in query[[key]] : subscript out of bounds
Hi
It's quite interesting to find this package. I have been using the built-in HTTP server of Rserve at work and it is definitely more performant than Plumber, RApache and OpenCPU.
Rserve forks processes effectively but it doesn't check if the machine has enough resources. Therefore it can lead to 500 error when it fails to fork a process due to insufficient memory.
Initially I looked into NGINX if there is a way to queue requests but only the paid edition has that functionality. On the other hand, HAProxy allows to limit # processes that can be handled. It become more effective after the API is 'proxy'-ed by HAProxy in a linked Docker environment.
I hope more deployment examples are introduced in the long run.
Regards
Jaehyeon
I am utilizing the dev branch to build a new version of our application. I am using supervisord to launch RServe. Here is a portion of the supervisor.conf file:
[program:rserve]
command=/usr/bin/R CMD Rserve --slave --RS-conf /app/conf/Rserve.conf
priority=1
autostart=true
autorestart=false
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
And my Reserve.conf is:
source /app/restapi.R
source /usr/local/lib/R/site-library/RestRserve/http_request.R
http.port 8001
encoding utf8
port 6311
daemon disable
pid.file Rserve.pid
This is generating an error because /usr/local/lib/R/site-library/RestRserve/http_request.R
doesn't exist on this branch.
How should we now be calling or launching RestRserve/RServe with the new version?
Thanks!
Hi there,
I'm using your library to get predictions based on a trained model.
Here is the code sequence when I start the program:
.RData
files and other binary filesAfter this, I have one process using about 350MB of memory (which is normal for this application).
Then, I do a request and a second process spawns using the same amount of memory (350MB).
After this, if I put some load on the API (more than 2 requests at the same time) a new process is spawned using the same amount of memory per process.
I understand that's the way RestRserve or Rserve handle concurrent requests (by forking) but I can't understand why each process has that memory usage. Since it's all shared read-only data, shouldn't all processes use the same memory space instead of copying the data?
I also don't understand why all the spawned processes are kept running even if there aren't any processes to handle.
And my third question is: what are the advantages of the recommended way of deploying the API (the one mentioned on the documentation) versus just running Rscript api.R
?
Sorry for the long text and probably some of the questions are basic ones but my knowledge in R is not really extensive.
Thank you!
For example as input a JSON {"a":"3","b":"4"} and to return as a result a json with the sum {"result","7"}.
I was trying a lot, but I didn't manage to do it. There is no POST example online. The most difficult part is how to get as input the JSON and get the variables to work with.
I've developed an API that serves 2 xgboost models and does some calculations with data.table.
The problem is that the API basically hangs when running this line xgb_preds = predict(model, data.matrix(d))
. There is no visible error nor traceback.
I've tried running the app a few times and it hangs when running the predict call to the xgboost models, I figure this out just printing the objects until nothing got printed.
Here is the code, i've changed some of the names and removed the last part for privacy issues. BTW, thanks for the great package!
source('foo.R')
source('baz.R')
library(RestRserve)
library(caret)
library(data.table)
library(ggplot2)
library(ISOweek)
library(zoo)
library(xgboost)
library(jsonlite)
library(Hmisc)
xgb_models <- list()
features <- list()
for(var in c("var1", "var2")){
xgb_models[[var]] <- xgb.load(sprintf('model_%s', var))
features[[var]] <- readRDS(sprintf('features_%s', var))
}
# read files
df1 = read_df1(path1)
df2 = readRDS(path2)
df3 = readRDS(path3)
df4 = readRDS("pricing/data/df4.rds")
df4[, allowed_miles:=ifelse(allowed_miles > 3100, 3500, allowed_miles)]
df4[, month:=month(pkup_dt)]
str(input)
# http://localhost:8001/endpoint?p1=M7991&p2=M1521&p3=2018-01-05&p4=2&p5=2017-12-15
foo <- function(request, response){
input <- list(p1=request$query[['p1']],
p2=request$query[['p2']],
p3=as.Date(request$query[['p3']], format="%Y-%m-%d"),
p4=as.numeric(request$query[['p4']]),
p5=as.Date(request$query[['p5']], format="%Y-%m-%d"))
print(str(input))
# Estimate competitor rates
data = list()
variables = c("var1", "var2")
for(var in variables){
extra_vars = c("var1_miles", "var1_days")
ignore_vars = c("var2_days")
flow_cols = c("grp1", "grp2")
ftr = features[[var]]
dt_in = parse_inputs(input)
dt_in = read_df2(dt_in, extra_vars, df2)
dt_in = merge(dt_in, df1, by=flow_cols)
bycols = c(flow_cols, 'ym_size_name', "wday", extra_vars)
dt_in = merge(dt_in, ftr[, setdiff(names(ftr), ignore_vars), with=F], by=bycols)
d = dt_in[, get_feature_names(extra_vars), with=FALSE]
print(data.matrix(d)) # this gets printed
model = xgb_models[[var]]
xgb_preds = predict(model, data.matrix(d)) #### the API hangs here!!!
print("foo") # this doesn't get printed
print(xgb_preds)
d[, est_rate:=xgb_preds]
print(d)
data[[var]] <- d
print(data[[var]])
}
#some more code that never gets executed.
response$body = toJSON(result)
response$content_type = "application/json"
response$headers = character(0)
response$status_code = 200L
forward()
}
# create application
app = RestRserve::RestRserveApplication$new()
# register endpoints and corresponding R handlers
app$add_get(path = "/endpoint", FUN = foo)
app$run(http_port = "8001")
At first glance it doesn't look that easy. Relevant SO thread https://stackoverflow.com/questions/15282471/get-stack-trace-on-trycatched-error-in-r/40899766#40899766
add_get()
add_post()
using Rserve::run.Rserve()
Seems they also can be implemented - https://github.com/s-u/Rserve/blob/05ff32d3c4512954a99162d392d0465d432d591e/src/http.c#L659
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.