rexyai / restrserve Goto Github PK

View Code? Open in Web Editor NEW

267.0 267.0 32.0 3.71 MB

R web API framework for building high-performance microservices and app backends

Home Page: https://restrserve.org

R 80.37% Dockerfile 0.21% HTML 0.71% Shell 0.18% C++ 18.53%

http-server openapi r rest-api swagger-ui

restrserve's People

Contributors

Stargazers

Watchers

restrserve's Issues

Allow URI templates in paths

RFC6570 and test suite https://github.com/uri-templates/uritemplate-test/tree/8014d2561706bfec7fd27a7465ff3f957381427c

Error handling

Implement interface which will allow to register custom error handlers
Implement standard simple error handler class/object

Start application with random Rserve port

By default Rserve use 6331 port (qap). So we can't start multiple application due 6311 will be busy by the first started application.
We should check port and generate another random port and check it.

README example code gives error

Thanks for this useful package! I ran into an error from running the code in the README. Apparently, the method new() for Logger does not take a file argument:

library(RestRserve)
logger = Logger$new(level = TRACE, file = "")
#> Error in .subset2(public_bind_env, "initialize")(...): unused argument (file = "")
# ...

^{Created on 2019-11-11 by the reprex package (v0.3.0)}

Serving static files

Think about serving static files API

Middleware

Seems powerful and matured concept:

Could be used to fix #6

Use lgr as logger

lgr now updated on CRAN on relatively stable.
Also we can reexport lgr object to suggest developers use it in their apps.

remove openapi* functions?

mb it worth to completely remove openapi* functions and stick to just yaml files. I found myself never using these functions anymore.

Use jsonlite for logging and serialization

Use jsonlite package instead self to_json implementation.

Reasons:

stability (jsonlite is well tested package)
performance (used C library as backend)

RestRserveResponse methods

Check what ideas we can borrow:

No response from API when serving LightGBM model

I am trying to train a LightGBM, and then serve it using RestRServe. However after making a request to the microservice the process hangs and I receive no response. I also don't receive any error or warning messages.

Running top reveals that the forked process has been created and some memory has been allocated (In the actual workflow where I encountered the issue CPU usage by the forked process initially increases then drops to 0. In the example below CPU usage is negligible, so I couldn't track it).

Here is a minimal reproducible example:

library(lightgbm)
library(RestRserve)

data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
train <- agaricus.train
test <- agaricus.test
bst <- lightgbm(
  data = train$data,
  label = train$label,
  num_leaves = 4,
  learning_rate = 1,
  nrounds = 2,
  objective = "binary"
)


dummy_api_function  <- function(request, response) {
  result <- predict(bst, test$data)[1]
  response$body <- jsonlite::toJSON(result)
  response$content_type <- "application/json"
  response$headers <- character(0)
  response$status_code <- 200L
  forward()
}

RestRserveApp <- RestRserve::RestRserveApplication$new()
RestRserveApp$add_post(path = "/api/dummy_api", FUN = dummy_api_function)
RestRserveApp$run(8001)

And an example of a curl request that leads to the process hanging:

curl --header "Content-Type: application/json" --request POST --data '{"foo":"bar"}' localhost:8001/api/dummy_api

(The same example but replacing result <- predict(bst, test$data)[1] with result <- 1 produces a result, so the issue must be with the predict call).

It is hard to tell whether the issue relates more to RestRserve or to LightGBM, but any help tracking the cause is appreciated.

Environment info:

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=bg_BG.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=bg_BG.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=bg_BG.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=bg_BG.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Matrix_1.2-12       RestRserve_0.1.0.13 lightgbm_2.2.4      R6_2.4.0           

loaded via a namespace (and not attached):
[1] compiler_3.4.4    magrittr_1.5      tools_3.4.4       yaml_2.2.0        Rserve_1.7-3      grid_3.4.4        data.table_1.12.0 jsonlite_1.6     
[9] lattice_0.20-35

curl --head always return 404

$ curl -I http://localhost:5001/api/hello
HTTP/1.1 404 Code 404
Content-type: application/json
Content-length: 30
$ curl -X GET -I http://localhost:5001/api/hello
HTTP/1.1 200 OK
Content-type: text/plain
Content-length: 10

Reading request headers

I have been trying to get request headers with the following code:

ApiReadHeader <- function(request, response) {
     response$body <-  request$headers[["TenantID"]]
     response$content_type <- "application/json"
     response$headers <- character(0)
     response$status_code <- 200L
     forward()
}
RestRserveApp <- RestRserve::RestRserveApplication$new()
RestRserveApp$add_get(path = "/api/readheader", FUN = ApiReadHeader)
RestRserveApp$run(8001)

And the following curl request :
curl -X GET "http://localhost:8001/api/readheader" -H "TenantID: 5a8d27c232f69a38f467517d"

I expected the server to respond with 5a8d27c232f69a38f467517d, however I got

Error in user code: subscript out of bounds
Call: request$headers[["TenantID"]]
Tracebeck:
  FUN(request, response)

What am I doing wrong?
And what is the best way to debug RestRserveApplication?

I had trouble doing that so I tried returning request's headers, quaries, and bodies to be able to have a look at them and their structure.

I also tried: response$body <- str(request$headers) however it returns and R error again.

Help is appreciated.
Thanks.

pkgdwon auto updated site

It would be nice add to Travis config auto generate pkgdown site.
After CRAN publication docs for the dev-version and stable on CRAN.

Test coverage

static files
- files
- directories
openapi

Raise expetions in process_request

process_request not catches some exceptions. For example raise from the ContentHandlers$get_decode. May be anywhere else.
As result Rserve raise it own HTTP status (500) and body (500 Evaluation error).

Stop background process from R

Would't it be nice to have:

backend <- BackendRserve$new()
backend$start(app, http_port = 1234, background = TRUE)
backend$stop()

structured errors

Starting from R 3.6.0 we can use structured errors - errorCondition. So middleware and handlers could easily raise errors which will be caught in a single place - RestRserveApplication. This will greatly simplify and streamline logic and will allow to remove forward(), interrupt() calls.

swagger-ui

Would be nice to have automatic swagger UI

Pluggable backends

in theory we should be able to have different backends - Rserve, httpuv, etc. In order to be prepared for that we need Request and Response classes should be backend independent. Which means:

we need to make to_rserve and from_rserve methods independent of Request and Response
helpers to construct Request from Rserve API should be also extracted from Request class

[Feature request] Add OpenAPI definition from file

Add to add_route method the openapi_file param with OpenAPI definition YAML file.

naming

rename:

RestRserveRequest -> Request
RestRserveResponse -> Response
RestRserveApplication -> Application

What else?

RestRserveRequest methods

Check what ideas we can borrow:

ContentHandlers$get_decode not handle multipart body

get_decode raises exception with multipart body. But body parsed and mapped with from_rserve . Also multipart body not processed with Request constructor (not parsed and mapped).

Return 405 when route not found for given HTTP method

Now returned 404.

Please share your feedback

This topic is to share feedback about current dev version - documentation, API, etc.
This topic is NOT for questions - use stackoverflow instead and mark you question with restrserve tag

Add DB connection pool helper

Needs IPC communication between Rserve child processes. Related to s-u/Rserve#105

Import scripts with roxygen like comments

We can source a script and parse comments before function definition.
Regex to parse is "^#['\\*]\\s*(@(\\w+)\\s+)?(.*)$".
Parse OpenAPI specs, method and path (template).

multipart data parser

Application to dump body:

.http.request = function(path, query, body, headers) {
  saveRDS(body, "body.rds")
  list("OK!", "text/plain", character(0), 200L)
}
Rserve::run.Rserve(http.port = 8001)

Send request:

echo "Test" > /tmp/test
curl http://127.0.0.1:8001 -F 'param=value' -F 'file=@/tmp/test' -F 'var=text'

After dump body looks like this:

> rawToChar(body)
[1] "--------------------------1cd7cb588b327247\r\nContent-Disposition: form-data; name=\"param\"\r\n\r\nvalue\r\n--------------------------1cd7cb588b327247\r\nContent-Disposition: form-data; name=\"file\"; filename=\"test\"\r\nContent-Type: application/octet-stream\r\n\r\nTEXT\n\r\n--------------------------1cd7cb588b327247--\r\n"
> cat(rawToChar(body))
--------------------------1cd7cb588b327247
Content-Disposition: form-data; name="param"

value
--------------------------1cd7cb588b327247
Content-Disposition: form-data; name="file"; filename="test"
Content-Type: application/octet-stream

TEXT

--------------------------1cd7cb588b327247--

There is example of R implementation in the FastRWeb package.
There is C++ implementation in the cpp-httplib library.

On input: raw vector.
On output: list with the following structure.

list(
  "param" = list(value = "value")
  "file" = list(filename = "test", offset = int, length = int, content_type = "application/octet-stream"),
  "var" = list(value = "text")
)

Line endings is \r\n. After boundary string following the Content-Disposition head and optional Content-Type header. Value or file content separated with additional empty line.

Add more demo applications

I think we should provide demos for the most common scenarios.
Suggestions are welcome.

[Feature request] Add get_url to RestRserveRequest class

Add 'url' method or property in RestRserveRequest class. Useful for logging and debugging.

Error when running example code.

I'm trying to run the example but I get these errors:

Error: object 'Application' not found
Error: object 'BackendRserve' not found

Then I saw that in the library there is no Application and BackendRserve function.
I have installed it both with GitHub link and from remotes::install_github("rexyai/RestRserve") but i get the same errors.

Running under Kubernetes

Hi all,

This is more an informational question. Is it possible/does it make sense to use this package inside a managed cluster, eg with Kubernetes? Since it appears that RestRServe can handle requests in parallel, and the point of K8s is to be able to scale to meet demand.

Any plan for pushing into cran soon?

Thank you for writing such a nice utility for concurrent R service. I wanted to check if there is any plan for pushing it cran?

Add tests

We should add some tests about:

Classes (constructor and methods):

RestRserveRequest (added in 323db68)
RestRserveResponse (added in 323db68)
HTTPErrorFactory (added in 323db68)
RestRserveApplication
RestRserveMiddleware (added in 323db68)
RestRserveRouter (added in 10d619a)

Functions:

Running server:

Static files (added in 1763f86)
Static directories (added in 1763f86)
Middleware (added in 1763f86)
OpenAPI (added in b81cfc0)
Swagger UI (added in b81cfc0)
Basic Auth (added in 1763f86)
OAuth (added in 1763f86)

Generate openapi.yaml to deployment dir

For the moment openapi spec openapi.yaml is generated on the fly. Worth to generate it once and serve from static file.

Improve swagger-ui integration

swagger package contains obsolete swagger-ui version.

- move swagger files to RestRserve and remove swagger dependency
- refactor write_swagger_ui_index_html: move index_html to file in inst/
- refactor RestRserveApplication$add_swagger_ui to use internal swagger files

Docker and HAproxy example

http and tcp forwarding

CORS

Need to figure out solution to add simple CORS - see discussion here falconry/falcon#1220

Ошибка при запрос с параметром URL без значения

Не увеерн, что такое поведение корректно.

curl 'https://localhost:5000/api/rating?clientid=123456&step
Error in query[[key]] : subscript out of bounds

Concern on 500 errors due to insufficient memory

It's quite interesting to find this package. I have been using the built-in HTTP server of Rserve at work and it is definitely more performant than Plumber, RApache and OpenCPU.

Rserve forks processes effectively but it doesn't check if the machine has enough resources. Therefore it can lead to 500 error when it fails to fork a process due to insufficient memory.

Initially I looked into NGINX if there is a way to queue requests but only the paid edition has that functionality. On the other hand, HAProxy allows to limit # processes that can be handled. It become more effective after the API is 'proxy'-ed by HAProxy in a linked Docker environment.

I hope more deployment examples are introduced in the long run.

Regards
Jaehyeon

Version 0.2.0 and RServe

I am utilizing the dev branch to build a new version of our application. I am using supervisord to launch RServe. Here is a portion of the supervisor.conf file:

[program:rserve]
command=/usr/bin/R CMD Rserve --slave --RS-conf /app/conf/Rserve.conf 
priority=1
autostart=true
autorestart=false
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0

And my Reserve.conf is:

source /app/restapi.R
source /usr/local/lib/R/site-library/RestRserve/http_request.R
http.port 8001
encoding utf8
port 6311
daemon disable
pid.file Rserve.pid

This is generating an error because /usr/local/lib/R/site-library/RestRserve/http_request.R doesn't exist on this branch.

How should we now be calling or launching RestRserve/RServe with the new version?

Thanks!

[QUESTION] Spawned processes and memory usage

Hi there,

I'm using your library to get predictions based on a trained model.

Here is the code sequence when I start the program:

Load all the necessary libraries, .RData files and other binary files
Declare my endpoint where the handler function sources a file where the appropriate code is

After this, I have one process using about 350MB of memory (which is normal for this application).

Then, I do a request and a second process spawns using the same amount of memory (350MB).

After this, if I put some load on the API (more than 2 requests at the same time) a new process is spawned using the same amount of memory per process.

I understand that's the way RestRserve or Rserve handle concurrent requests (by forking) but I can't understand why each process has that memory usage. Since it's all shared read-only data, shouldn't all processes use the same memory space instead of copying the data?

I also don't understand why all the spawned processes are kept running even if there aren't any processes to handle.

And my third question is: what are the advantages of the recommended way of deploying the API (the one mentioned on the documentation) versus just running Rscript api.R?

Sorry for the long text and probably some of the questions are basic ones but my knowledge in R is not really extensive.

Thank you!

Can you please add an example of POST method taking as input a json and outputs a json?

For example as input a JSON {"a":"3","b":"4"} and to return as a result a json with the sum {"result","7"}.
I was trying a lot, but I didn't manage to do it. There is no POST example online. The most difficult part is how to get as input the JSON and get the variables to work with.

request hangs with no error - serving xgboost model

I've developed an API that serves 2 xgboost models and does some calculations with data.table.
The problem is that the API basically hangs when running this line xgb_preds = predict(model, data.matrix(d)). There is no visible error nor traceback.

I've tried running the app a few times and it hangs when running the predict call to the xgboost models, I figure this out just printing the objects until nothing got printed.

Here is the code, i've changed some of the names and removed the last part for privacy issues. BTW, thanks for the great package!

source('foo.R')
source('baz.R')

library(RestRserve)
library(caret)
library(data.table)
library(ggplot2)
library(ISOweek)
library(zoo)
library(xgboost)
library(jsonlite)
library(Hmisc)

xgb_models <- list()
features <- list()
for(var in c("var1", "var2")){
   xgb_models[[var]] <- xgb.load(sprintf('model_%s', var))
   features[[var]] <- readRDS(sprintf('features_%s', var))
}

# read files
df1 = read_df1(path1)
df2 = readRDS(path2)
df3 = readRDS(path3)

df4 = readRDS("pricing/data/df4.rds")
df4[, allowed_miles:=ifelse(allowed_miles > 3100, 3500, allowed_miles)]
df4[, month:=month(pkup_dt)]


str(input)
# http://localhost:8001/endpoint?p1=M7991&p2=M1521&p3=2018-01-05&p4=2&p5=2017-12-15

foo <- function(request, response){
   input <- list(p1=request$query[['p1']],
                 p2=request$query[['p2']],
                 p3=as.Date(request$query[['p3']], format="%Y-%m-%d"),
                 p4=as.numeric(request$query[['p4']]),
                 p5=as.Date(request$query[['p5']], format="%Y-%m-%d"))

   print(str(input))

   # Estimate competitor rates
   data = list()
   variables = c("var1", "var2")
   for(var in variables){
      extra_vars = c("var1_miles", "var1_days")
      ignore_vars = c("var2_days")
      
      flow_cols = c("grp1", "grp2")
      ftr = features[[var]]
      dt_in = parse_inputs(input)
      dt_in = read_df2(dt_in, extra_vars, df2)
      dt_in = merge(dt_in, df1, by=flow_cols)
      
      bycols = c(flow_cols, 'ym_size_name', "wday", extra_vars)
      dt_in = merge(dt_in, ftr[, setdiff(names(ftr), ignore_vars), with=F], by=bycols)
      
      d = dt_in[, get_feature_names(extra_vars), with=FALSE]
      print(data.matrix(d)) # this gets printed
      model = xgb_models[[var]] 
      xgb_preds = predict(model, data.matrix(d))  #### the API hangs here!!! 
      print("foo") # this doesn't get printed
      print(xgb_preds)
      d[, est_rate:=xgb_preds]
      print(d)    
      data[[var]] <- d
      print(data[[var]])
   }

   #some more code that never gets executed.


   response$body = toJSON(result)
   response$content_type = "application/json"
   response$headers = character(0)
   response$status_code = 200L
   forward()
}


# create application
app = RestRserve::RestRserveApplication$new()

# register endpoints and corresponding R handlers
app$add_get(path = "/endpoint", FUN = foo)
app$run(http_port = "8001")

Investigate how to print tracebacks

At first glance it doesn't look that easy. Relevant SO thread https://stackoverflow.com/questions/15282471/get-stack-trace-on-trycatched-error-in-r/40899766#40899766

add_get, add_post shortcuts

add_get()
add_post()

start application in interactive mode

using Rserve::run.Rserve()

Investigate other http methods

Seems they also can be implemented - https://github.com/s-u/Rserve/blob/05ff32d3c4512954a99162d392d0465d432d591e/src/http.c#L659

rexyai / restrserve Goto Github PK

restrserve's People

Contributors

Stargazers

Watchers

Forkers

restrserve's Issues

Recommend Projects

Recommend Topics

Recommend Org