Giter VIP home page Giter VIP logo

restrserve's People

Contributors

abrja avatar artemklevtsov avatar davzim avatar dselivanov avatar hafen avatar jangorecki avatar jonekeat avatar sambaala avatar schloerke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

restrserve's Issues

Error handling

  • Implement interface which will allow to register custom error handlers
  • Implement standard simple error handler class/object

Start application with random Rserve port

By default Rserve use 6331 port (qap). So we can't start multiple application due 6311 will be busy by the first started application.
We should check port and generate another random port and check it.

README example code gives error

Thanks for this useful package! I ran into an error from running the code in the README. Apparently, the method new() for Logger does not take a file argument:

library(RestRserve)
logger = Logger$new(level = TRACE, file = "")
#> Error in .subset2(public_bind_env, "initialize")(...): unused argument (file = "")
# ...

Created on 2019-11-11 by the reprex package (v0.3.0)

Use lgr as logger

lgr now updated on CRAN on relatively stable.
Also we can reexport lgr object to suggest developers use it in their apps.

remove openapi* functions?

mb it worth to completely remove openapi* functions and stick to just yaml files. I found myself never using these functions anymore.

No response from API when serving LightGBM model

I am trying to train a LightGBM, and then serve it using RestRServe. However after making a request to the microservice the process hangs and I receive no response. I also don't receive any error or warning messages.

Running top reveals that the forked process has been created and some memory has been allocated (In the actual workflow where I encountered the issue CPU usage by the forked process initially increases then drops to 0. In the example below CPU usage is negligible, so I couldn't track it).

Here is a minimal reproducible example:

library(lightgbm)
library(RestRserve)

data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
train <- agaricus.train
test <- agaricus.test
bst <- lightgbm(
  data = train$data,
  label = train$label,
  num_leaves = 4,
  learning_rate = 1,
  nrounds = 2,
  objective = "binary"
)


dummy_api_function  <- function(request, response) {
  result <- predict(bst, test$data)[1]
  response$body <- jsonlite::toJSON(result)
  response$content_type <- "application/json"
  response$headers <- character(0)
  response$status_code <- 200L
  forward()
}

RestRserveApp <- RestRserve::RestRserveApplication$new()
RestRserveApp$add_post(path = "/api/dummy_api", FUN = dummy_api_function)
RestRserveApp$run(8001)

And an example of a curl request that leads to the process hanging:

curl --header "Content-Type: application/json" --request POST --data '{"foo":"bar"}' localhost:8001/api/dummy_api

(The same example but replacing result <- predict(bst, test$data)[1] with result <- 1 produces a result, so the issue must be with the predict call).

It is hard to tell whether the issue relates more to RestRserve or to LightGBM, but any help tracking the cause is appreciated.

Environment info:

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=bg_BG.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=bg_BG.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=bg_BG.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=bg_BG.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Matrix_1.2-12       RestRserve_0.1.0.13 lightgbm_2.2.4      R6_2.4.0           

loaded via a namespace (and not attached):
[1] compiler_3.4.4    magrittr_1.5      tools_3.4.4       yaml_2.2.0        Rserve_1.7-3      grid_3.4.4        data.table_1.12.0 jsonlite_1.6     
[9] lattice_0.20-35

curl --head always return 404

$ curl -I http://localhost:5001/api/hello
HTTP/1.1 404 Code 404
Content-type: application/json
Content-length: 30
$ curl -X GET -I http://localhost:5001/api/hello
HTTP/1.1 200 OK
Content-type: text/plain
Content-length: 10

Reading request headers

I have been trying to get request headers with the following code:

ApiReadHeader <- function(request, response) {
     response$body <-  request$headers[["TenantID"]]
     response$content_type <- "application/json"
     response$headers <- character(0)
     response$status_code <- 200L
     forward()
}
RestRserveApp <- RestRserve::RestRserveApplication$new()
RestRserveApp$add_get(path = "/api/readheader", FUN = ApiReadHeader)
RestRserveApp$run(8001)

And the following curl request :
curl -X GET "http://localhost:8001/api/readheader" -H "TenantID: 5a8d27c232f69a38f467517d"

I expected the server to respond with 5a8d27c232f69a38f467517d, however I got

Error in user code: subscript out of bounds
Call: request$headers[["TenantID"]]
Tracebeck:
  FUN(request, response)

What am I doing wrong?
And what is the best way to debug RestRserveApplication?

I had trouble doing that so I tried returning request's headers, quaries, and bodies to be able to have a look at them and their structure.

I also tried: response$body <- str(request$headers) however it returns and R error again.

Help is appreciated.
Thanks.

pkgdwon auto updated site

It would be nice add to Travis config auto generate pkgdown site.
After CRAN publication docs for the dev-version and stable on CRAN.

Raise expetions in process_request

process_request not catches some exceptions. For example raise from the ContentHandlers$get_decode. May be anywhere else.
As result Rserve raise it own HTTP status (500) and body (500 Evaluation error).

Stop background process from R

Would't it be nice to have:

backend <- BackendRserve$new()
backend$start(app, http_port = 1234, background = TRUE)
backend$stop()

?

structured errors

Starting from R 3.6.0 we can use structured errors - errorCondition. So middleware and handlers could easily raise errors which will be caught in a single place - RestRserveApplication. This will greatly simplify and streamline logic and will allow to remove forward(), interrupt() calls.

swagger-ui

Would be nice to have automatic swagger UI

Pluggable backends

in theory we should be able to have different backends - Rserve, httpuv, etc. In order to be prepared for that we need Request and Response classes should be backend independent. Which means:

  • we need to make to_rserve and from_rserve methods independent of Request and Response
  • helpers to construct Request from Rserve API should be also extracted from Request class

naming

rename:

  • RestRserveRequest -> Request
  • RestRserveResponse -> Response
  • RestRserveApplication -> Application

What else?

Please share your feedback

This topic is to share feedback about current dev version - documentation, API, etc.
This topic is NOT for questions - use stackoverflow instead and mark you question with restrserve tag

Import scripts with roxygen like comments

We can source a script and parse comments before function definition.
Regex to parse is "^#['\\*]\\s*(@(\\w+)\\s+)?(.*)$".
Parse OpenAPI specs, method and path (template).

multipart data parser

Application to dump body:

.http.request = function(path, query, body, headers) {
  saveRDS(body, "body.rds")
  list("OK!", "text/plain", character(0), 200L)
}
Rserve::run.Rserve(http.port = 8001)

Send request:

echo "Test" > /tmp/test
curl http://127.0.0.1:8001 -F 'param=value' -F 'file=@/tmp/test' -F 'var=text'

After dump body looks like this:

> rawToChar(body)
[1] "--------------------------1cd7cb588b327247\r\nContent-Disposition: form-data; name=\"param\"\r\n\r\nvalue\r\n--------------------------1cd7cb588b327247\r\nContent-Disposition: form-data; name=\"file\"; filename=\"test\"\r\nContent-Type: application/octet-stream\r\n\r\nTEXT\n\r\n--------------------------1cd7cb588b327247--\r\n"
> cat(rawToChar(body))
--------------------------1cd7cb588b327247
Content-Disposition: form-data; name="param"

value
--------------------------1cd7cb588b327247
Content-Disposition: form-data; name="file"; filename="test"
Content-Type: application/octet-stream

TEXT

--------------------------1cd7cb588b327247--

There is example of R implementation in the FastRWeb package.
There is C++ implementation in the cpp-httplib library.

On input: raw vector.
On output: list with the following structure.

list(
  "param" = list(value = "value")
  "file" = list(filename = "test", offset = int, length = int, content_type = "application/octet-stream"),
  "var" = list(value = "text")
)

Line endings is \r\n. After boundary string following the Content-Disposition head and optional Content-Type header. Value or file content separated with additional empty line.

Related RFC: https://tools.ietf.org/html/rfc7578

Add more demo applications

I think we should provide demos for the most common scenarios.
Suggestions are welcome.

  • Predict lm/glm model
  • Predict xgboost model
  • Predict caret model
  • Plots output
  • XML output
  • Rendering HTML page
  • Fetch and output data from database
  • Insert data to database

Error when running example code.

I'm trying to run the example but I get these errors:

Error: object 'Application' not found
Error: object 'BackendRserve' not found

Then I saw that in the library there is no Application and BackendRserve function.
I have installed it both with GitHub link and from remotes::install_github("rexyai/RestRserve") but i get the same errors.

Running under Kubernetes

Hi all,

This is more an informational question. Is it possible/does it make sense to use this package inside a managed cluster, eg with Kubernetes? Since it appears that RestRServe can handle requests in parallel, and the point of K8s is to be able to scale to meet demand.

Add tests

We should add some tests about:

Classes (constructor and methods):

  • RestRserveRequest (added in 323db68)
  • RestRserveResponse (added in 323db68)
  • HTTPErrorFactory (added in 323db68)
  • RestRserveApplication
  • RestRserveMiddleware (added in 323db68)
  • RestRserveRouter (added in 10d619a)

Functions:

  • raise (added in 323db68)
  • try_capture_stack (added in #43)
  • deparse_vector (added in #43)
  • get_traceback (added in #43)
  • URLenc (added in #43)
  • dict_ functions (removed in #43)
  • parse_query (added in d3ac6ef)
  • parse_headers (added in d3ac6ef)
  • extract_docstrings_yaml (added in ab19b6a)
  • guess_mime (added in #43)
  • is_string (added in #43)
  • is_path (added in #43)

Running server:

Improve swagger-ui integration

swagger package contains obsolete swagger-ui version.

  • - move swagger files to RestRserve and remove swagger dependency
  • - refactor write_swagger_ui_index_html: move index_html to file in inst/
  • - refactor RestRserveApplication$add_swagger_ui to use internal swagger files

Concern on 500 errors due to insufficient memory

Hi

It's quite interesting to find this package. I have been using the built-in HTTP server of Rserve at work and it is definitely more performant than Plumber, RApache and OpenCPU.

Rserve forks processes effectively but it doesn't check if the machine has enough resources. Therefore it can lead to 500 error when it fails to fork a process due to insufficient memory.

Initially I looked into NGINX if there is a way to queue requests but only the paid edition has that functionality. On the other hand, HAProxy allows to limit # processes that can be handled. It become more effective after the API is 'proxy'-ed by HAProxy in a linked Docker environment.

I hope more deployment examples are introduced in the long run.

Regards
Jaehyeon

Version 0.2.0 and RServe

I am utilizing the dev branch to build a new version of our application. I am using supervisord to launch RServe. Here is a portion of the supervisor.conf file:

[program:rserve]
command=/usr/bin/R CMD Rserve --slave --RS-conf /app/conf/Rserve.conf 
priority=1
autostart=true
autorestart=false
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0

And my Reserve.conf is:

source /app/restapi.R
source /usr/local/lib/R/site-library/RestRserve/http_request.R
http.port 8001
encoding utf8
port 6311
daemon disable
pid.file Rserve.pid

This is generating an error because /usr/local/lib/R/site-library/RestRserve/http_request.R doesn't exist on this branch.

How should we now be calling or launching RestRserve/RServe with the new version?

Thanks!

[QUESTION] Spawned processes and memory usage

Hi there,

I'm using your library to get predictions based on a trained model.

Here is the code sequence when I start the program:

  1. Load all the necessary libraries, .RData files and other binary files
  2. Declare my endpoint where the handler function sources a file where the appropriate code is

After this, I have one process using about 350MB of memory (which is normal for this application).

Then, I do a request and a second process spawns using the same amount of memory (350MB).

After this, if I put some load on the API (more than 2 requests at the same time) a new process is spawned using the same amount of memory per process.

I understand that's the way RestRserve or Rserve handle concurrent requests (by forking) but I can't understand why each process has that memory usage. Since it's all shared read-only data, shouldn't all processes use the same memory space instead of copying the data?

I also don't understand why all the spawned processes are kept running even if there aren't any processes to handle.

And my third question is: what are the advantages of the recommended way of deploying the API (the one mentioned on the documentation) versus just running Rscript api.R?

Sorry for the long text and probably some of the questions are basic ones but my knowledge in R is not really extensive.

Thank you!

request hangs with no error - serving xgboost model

I've developed an API that serves 2 xgboost models and does some calculations with data.table.
The problem is that the API basically hangs when running this line xgb_preds = predict(model, data.matrix(d)). There is no visible error nor traceback.

I've tried running the app a few times and it hangs when running the predict call to the xgboost models, I figure this out just printing the objects until nothing got printed.

Here is the code, i've changed some of the names and removed the last part for privacy issues. BTW, thanks for the great package!

source('foo.R')
source('baz.R')

library(RestRserve)
library(caret)
library(data.table)
library(ggplot2)
library(ISOweek)
library(zoo)
library(xgboost)
library(jsonlite)
library(Hmisc)

xgb_models <- list()
features <- list()
for(var in c("var1", "var2")){
   xgb_models[[var]] <- xgb.load(sprintf('model_%s', var))
   features[[var]] <- readRDS(sprintf('features_%s', var))
}

# read files
df1 = read_df1(path1)
df2 = readRDS(path2)
df3 = readRDS(path3)

df4 = readRDS("pricing/data/df4.rds")
df4[, allowed_miles:=ifelse(allowed_miles > 3100, 3500, allowed_miles)]
df4[, month:=month(pkup_dt)]


str(input)
# http://localhost:8001/endpoint?p1=M7991&p2=M1521&p3=2018-01-05&p4=2&p5=2017-12-15

foo <- function(request, response){
   input <- list(p1=request$query[['p1']],
                 p2=request$query[['p2']],
                 p3=as.Date(request$query[['p3']], format="%Y-%m-%d"),
                 p4=as.numeric(request$query[['p4']]),
                 p5=as.Date(request$query[['p5']], format="%Y-%m-%d"))

   print(str(input))

   # Estimate competitor rates
   data = list()
   variables = c("var1", "var2")
   for(var in variables){
      extra_vars = c("var1_miles", "var1_days")
      ignore_vars = c("var2_days")
      
      flow_cols = c("grp1", "grp2")
      ftr = features[[var]]
      dt_in = parse_inputs(input)
      dt_in = read_df2(dt_in, extra_vars, df2)
      dt_in = merge(dt_in, df1, by=flow_cols)
      
      bycols = c(flow_cols, 'ym_size_name', "wday", extra_vars)
      dt_in = merge(dt_in, ftr[, setdiff(names(ftr), ignore_vars), with=F], by=bycols)
      
      d = dt_in[, get_feature_names(extra_vars), with=FALSE]
      print(data.matrix(d)) # this gets printed
      model = xgb_models[[var]] 
      xgb_preds = predict(model, data.matrix(d))  #### the API hangs here!!! 
      print("foo") # this doesn't get printed
      print(xgb_preds)
      d[, est_rate:=xgb_preds]
      print(d)    
      data[[var]] <- d
      print(data[[var]])
   }

   #some more code that never gets executed.


   response$body = toJSON(result)
   response$content_type = "application/json"
   response$headers = character(0)
   response$status_code = 200L
   forward()
}


# create application
app = RestRserve::RestRserveApplication$new()

# register endpoints and corresponding R handlers
app$add_get(path = "/endpoint", FUN = foo)
app$run(http_port = "8001")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.