Giter VIP home page Giter VIP logo

ggrandomforests's People

Contributors

ehrlinger avatar romainfrancois avatar timelyportfolio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ggrandomforests's Issues

randomForestSRC-regression vignette

rmarkdown is cool, but...

Want to update the arXiv submission, rmarkdown latex is pretty ugly still.

  • So port the vignette back to knitr latex format.
  • incorporate many changes from randomForestSRC-survival vignette.

pct?

Hi,
Just a quick question.
What pct stand for in calc_roc.rfsrc?

Scale Y label

Thank you for the great package. I'm currently running it on my data and the plots are rather cumbersome since there are more than 400 variables. Would you please kindly advice me on how to make them more readable, probably there is a way to scale down the Y label font?
Sorry if I'm asking in the wrong place, I'm quite new to Data Analysis, being MD in my background.

ranomForestSRC::plot.variable not behaving as expected

I am attempting to work through the "Random Forests for Regression" vignette, however I have run into an issue near the end when generating partial coplot data for a contour plot. Rather than returning 50 unique coplots for each specific value of rm, the coplots are all identical. The result is that the contour plot only has contours for predicted y values from one x variable, making it just a 3D representation of a single variable co-plot.

In a previous version of the "Random Forests for Regression" vignette this did not appear to be an issue (see output image below)
image

However, in the updated version of the vignette, the issue has appeared:
Screen Shot 2022-08-05 at 12 18 22 AM

Perhaps an update in the plot.variable() function have altered how these codes perform?

st.labs

The package and the vignette codes are working perfectly except st.labs involved codes.

plot(gg_md, lbls=st.labs)
Error in plot.gg_minimal_depth(gg_md, lbls = st.labs) :
object 'st.labs' not found

surv.type = mortality

We often want the alternative to prob of survival, mortality =1-survival.

rfsrc returns something else when surv.type="mort", not what we expect. Should be a simple conversion.

gg_error is not showing anything

Hi @ehrlinger,

Thanks for make easier working with radomForestSRC, you saved me a lot time! After calculating a rfsrc object with the following code...

> rfs <- rfsrc(Surv(time, event) ~ ., data = mydata, nsplit = NULL, ntree = 100, importance = T)
> rfs
                         Sample size: 99
                    Number of deaths: 35
                     Number of trees: 100
           Forest terminal node size: 15
       Average no. of terminal nodes: 9.44
No. of variables tried at each split: 4
              Total no. of variables: 12
       Resampling used to grow trees: swor
    Resample size used to grow trees: 63
                            Analysis: RSF
                              Family: surv
                      Splitting rule: logrank
                          Error rate: 28.85%

... I'm trying to obtain the OOB plot for each tree but I'm having this (uninformative) error and a empty plot appears:

> plot(gg_error(rfs))
Warning message:
Removed 9 rows containing missing values (geom_path).

Any idea about what I'm doing wrong, please? Thanks in advance.

Convert S3method functions to use `UseNext` instead of `UseMethod`

Convert S3method functions to use UseNext instead of UseMethod method dispatch.

This should fix issues with arguments, and prepare the way for extending to random forest packages beyond randomForestSRC #3.

Should also tighten up the whole OO design so that functions that look like S3methods really are S3methods.

Naming consistency (away from S3 type names)

v1.1.4 release indicates I need to refactor function names to remove dot separators because of clashes with S3method names.

Example functions:
combine.gg_partial does not really extend combine.default #20
plot.gg_minimal_vimp should be an argument based alternative to plot.gg_minimal_depth #21

gg_roc with multiple outcomes.

Right now it only does a single class, but overlaying multiple classes on each ROC curve is possible
An alternative of multiple panels.

gg_survival with 'by' handles factors with NA incorrectly when occurring before other levels

I don't have a good minimal working example or anything, but I'm going to try my best to describe what's going on.

After calling gg_survival with either type "kaplan" or type "nelson" and supplying a factor for 'by', survfit with strata on 'by' is called. The default is na.group = FALSE, so it stratifies only on the other levels of the factor.

A little further down in the code, we have this bit:

  if(!is.null(by)){
    tm_splits <- which(c(FALSE,sapply(2:nrow(tbl), function(ind){tbl$time[ind] < tbl$time[ind - 1]})))

    lbls <- unique(data[,by])
    tbl$groups <- lbls[1]

    for(ind in 2:(length(tm_splits) + 1)){
      tbl$groups[tm_splits[ind - 1]:nrow(tbl)] <- lbls[ind]
    }
  }

Unique also returns 'NA' as an option, but NA was not included as a stratum level, so if you have a situation where NA occurs before at least one of your levels, it will take its place and you'll drop a factor level you potentially cared about.

I solved it myself by editing in na.group = TRUE to the call to strata in the kaplan and nelson functions because I wanted that information anyway, but I guess this might be something encountered by others as well!

conditioning a coplot on a factor

Hello.
In the ggRandomForests: Exploring Random Forest Survival paper, there is a partial dependence coplot of 1 year survival against bilirubin,conditional on albumin interval group membership (figure 24)
I am trying to create a similar cotplot, but instead of conditioning on albumin intervals, I would like to condition on a variable that was originally categorized (for instance edema or ascites). I tried doing so to but could not get the script right.
Would deeply appreciate your help in this matter.
Thanks
Roni

Select outcome for classification VIMP plot

By gg_vimp returns a VIMP panel for each factor from a classification forest. If we provide which.outcome, we want a VIMP figure for that factor only. Sorted this VIMP for the factor of interest.

Reduce size of pkg for CRAN compliance

I've got a bit crazy on test and examples, which require cached rfsrc objects. The cache requirement is due to computational expense and rfsrc version issues (I can use pre-release rfsrc).

So, we'll remove the airq, mtcars and veteran cached objects. The objects can still be built from rfsrc_cache_dataset function...

  • refactor tests to use iris, Boston and pbc for classification, regression and survival.
  • Bound all examples NOT in {iris, Boston, pbc} with \dontrun
  • remove cached datasets NOT in {iris, Boston, pbc}
  • refector rfsrc_cache_dataset to create only set=c("iris", "Boston", "pbc") by default.

Possible bug: plot.gg_interaction facet order

Hi John,

When trying to produce variable interaction plots with plot.gg_interaction() I cannot get the order of the faceting to match the minimal depth rank order, with the set I am interested in being produced in alphabetical order instead.

I looked at several other plot methods for the package and found that 'variable' is often set to a factor after a gather step. I tested if this could fix the issue by adding
gg_dta$variable <- factor(gg_dta$variable, levels=unique(gg_dta$variable))
after line 143, which worked.

This doesn't exclude the (high) possibility that I am simply doing something wrong, but I thought I would send through the code. Thanks for the great package!

make vignette more reproducible

Hi,
The Survival vignette is really looking good with lots of great plots, but I'm having problems reproducing many of the examples.
For example, in the begging you mention that you prefer "years" to "days" in the pbc data set. Yet there is no code how you convert it.
Doing a naive
pbc$years <- pbc$days/365
I fail in the next part using the gg_survival function example.
Next, there is no code for the very nice EDA plots in the vignette.
I also could not get the 3D example in Appendix 1 to work.
This line
partial_time <- do.call(rbind,lapply(partial_pbc_time, gg_partial))
always produces errors.
I always get errors when there is a theme() part in plot.
There were some updates to randomForestSRC and ggplot recently that may
cause a lot of these problems.

Consistent variable and partial dependence yhat values.

This is an rfsrc issue. For survival and classification forests, variable dependence is returned on [0,1], partial dependence for survival is returned as [0,100]. Should be able to detect the difference and convert in gg_variable.

data.frame should be last of the classes

Having the classes "gg_rfsrc" "data.frame" "class" does mess up internal code used by e.g. tidyr and dplyr and hence this is one of the reasons this package fails against dplyr 1.0.0:

library(ggRandomForests)
#> Loading required package: randomForestSRC
#> 
#>  randomForestSRC 2.9.3 
#>  
#>  Type rfsrc.news() to see new features, changes, and bug fixes. 
#> 

data(rfsrc_iris, package="ggRandomForests")
gg_dta <- gg_rfsrc(rfsrc_iris)
str(gg_dta)
#> Classes 'gg_rfsrc', 'class' and 'data.frame':    150 obs. of  4 variables:
#>  $ setosa    : num  1 1 1 1 1 ...
#>  $ versicolor: num  0 0 0 0 0 ...
#>  $ virginica : num  0 0 0 0 0 0 0 0 0 0 ...
#>  $ y         : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
class(gg_dta)
#> [1] "gg_rfsrc"   "data.frame" "class"

gg_plt <- ggRandomForests:::plot.gg_rfsrc(gg_dta)
#> Error: Input must be a vector, not a `gg_rfsrc/data.frame/class` object.

Created on 2020-04-03 by the reprex package (v0.3.0)

Is this ready to be in package form yet?

I ran devtools::install_github(repo = "ggRandomForests", username = "ehrlinger") in my 3.0.1 version of R. I got * installing source package 'ggRandomForests' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Error in namespaceExport(ns, exports) :
undefined exports: ggCompetingRisk, ggCompetingRisk.ggRandomForests, ggCoplot.ggRandomForests, ggInteraction, ggInteraction.ggRandomForests, ggMinDepth, ggMinDepth.ggRandomForests, ggSurvival, ggSurvival.ggRandomForests, plot.ggRandomForests, show.ggRandomForests
Error: loading failed
Execution halted
ERROR: loading failed

  • removing 'C:/Users/Jonathan/Documents/R/win-library/3.1/ggRandomForests'
    Error: Command failed (1)

plot.gg_survival has wrong legend

When using gg_survival with the by= parameter and then plotting using plot.gg_survival the colors and the legends are not correct assigned to the groups.
To be precise, the error seams to be in kaplan.R, line 72ff.

pkg install throws "not available (for..."

On Win8 64 bit RStudio, throws for R 3.2.0, R 3.1.3, & below:

install.packages("ggRandomForest")
Installing package into ‘C:/Users/Jim/Documents/R/win-library/3.1’
(as ‘lib’ is unspecified)
Warning in install.packages :
package ‘ggRandomForest’ is not available (for R version 3.1.2)

using both RStudio global CRAN and IA CRAN
Tnx, Jim

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.