Giter VIP home page Giter VIP logo

vioplot's Introduction

vioplot

Version 0.4.0

CRAN_Status_Badge Travis Build Status CircleCI AppVeyor Build Status Project Status: Active – The project has reached a stable, usable state and is being actively developed. codecov GitHub Views Downloads Total Downloads

Violin Plots in R

This package allows extensive customisation of violin plots.

Installation

To get the current released version from CRAN:

install.packages("vioplot")

To get the development version from github:

# install.packages("devtools")
devtools::install_github("TomKellyGenetics/vioplot", ref = "dev")

Running

See the relevant vignette for more details:

  • Customising colour and shape with scalar inputs or vectors applied separately to each violin.

https://rawgit.com/TomKellyGenetics/vioplot/vignettes/vignettes/violin_customisation.html

  • Formula input enabled with S3 methods.

https://rawgit.com/TomKellyGenetics/vioplot/vignettes/vignettes/violin_formulae.html

  • Control of violin area for proportional widths

https://rawgit.com/TomKellyGenetics/vioplot/vignettes/vignettes/violin_area.html

  • Control of the y-axis including disabling labels and log-scale

https://rawgit.com/TomKellyGenetics/vioplot/vignettes/vignettes/violin_ylog.html

  • Split violins to directly compare paired data.

https://rawgit.com/TomKellyGenetics/vioplot/vignettes/vignettes/violin_split.html

Functionality

vioplot (0.3) is backwards-compatible with vioplot (0.2). The following features are supported:

  • vioplot() generates a violin plot by plotting a violin for each group of variables.

  • vioplot() also takes additional arguments to specify main, sub, xlab, and ylab as used in plot or title. Graphical parameters can be passed to plotting parameters.

  • vioplot() can take vectorised forms of colour variables col, border, and rectCol to modify the colours separately for each violin respectively. This also applies to a new variable lineCol to modify the colour of the boxplots.

  • vioplot.formula() is enabled to take formula and dataframe inputs as used for boxplot and stats operations. The default axes labels are the variable names used for the formula and names are factor levels.

  • additional areaEqual, plotCentre and side options enables further customisation.

See the NEWS.md file for more detail on added features in the 0.3 release.

Development and sources

For development history of version 0.3.0 prior to package documentation, see the original repo: https://github.com/TomKellyGenetics/R-violin-plot/commits?author=TomKellyGenetics

Modifications inspired by the following StackOverFlow threads and GitHub Gists:

Attribution

This repository is a proposed submission for an updated version of the vioplot originally released by Daniel Adler (University of Göttingen, Germany) on CRAN. This package has been orphaned on CRAN and is no longer actively maintained. I acknowledge with contributions of Daniel Adler as the original developer and Tom Elliot (University of Auckland, New Zealand) for a pull request and welcome further contributions to improve or maintain this package.

This package update was developed and released open-source (in accordance with the original package BSD License) while as a PhD candidate at the University of Otago (Dunedin, New Zealand). I can be contacted at my present address and affiliation is (RIKEN Centre for Integrative Medical Sciences, Yokohama, Japan) at <tom.kelly[at]riken.jp>.

Citation

The following information can be retrieved from within an R session by using citation(vioplot). Please acknowledge as follows if features included in this version are used.

To cite the enhanced vioplot package in publications use:

Daniel Adler and S. Thomas Kelly (2022). vioplot: violin plot. R package version 0.4.0 https://github.com/TomKellyGenetics/vioplot

A BibTeX entry for LaTeX users is

@Manual{, title = {vioplot: violin plot}, author = {Daniel Adler, S. Thomas Kelly, Tom Elliot, and Jordan Adamson}, year = {2022}, note = {R package version 0.4.0}, url = {https://github.com/TomKellyGenetics/vioplot}, }

Please also acknowledge the original package: citation("vioplot")

vioplot's People

Contributors

jadamso avatar tmelliott avatar tomkellygenetics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

vioplot's Issues

Spurious print(ylim) in vioplot

The latest release of vioplot (it may also happen
in earlier releases), has an irritating typo:

Line 333 of vioplot.R contains a spurious:
print(ylim)

which generates output if a value is assigned to this argument.

Reported by @Derek-Jones

cex passed as argument is ignored

vioplot(stuff, cex =0.9) does not have the desired effect, while

par(cex=0.9)
vioplot(stuff)

works fine.

Will report issues here, now I know about the github repro (it's not listed on CRAN)

Vioplot not working when formula categories are numbers

Trying to plot two parallel violin plots using the formula interface. Here's a minimal example and the (surprising) outcome:

library("vioplot")
DF <- data.frame(x=c(rnorm(500),  rnorm(250, mean=-1, sd=.6), rnorm(250, mean=1, sd=.6)), group=factor(rep(1:2, times=c(500, 500))))
vioplot(x ~ group, data=DF)

which produces this result

Screen Shot 2019-08-30 at 14 09 32

I'm fairly sure the problem is caused by the last line of vioplot.formula

expression(vioplot( 1, 2 , xlab = xlab, ylab = ylab, names = names, ...))

Since the categories are named the numbers 1 and 2, they are parsed as numbers by vioplot.default and not as the labels referencing the data in the datas column. As far as I can see the default is not easily modified since the first two arguments are also used as the actual labels in the plot.

cex.names is not used

Looking at the code, the parameter cex.names is assigned to, but never passed to anything, to have an effect (apart from testing whether it has a non-default value).

xaxt option is ignored

The xaxt argument is ignored in calls to vioplot, e.g., xaxt="n" has no effect.

Internally, this parameter does not appear in calls to plot.window.

Error when using levels with spaces

The vioplot function provides an error if you are using a data frame with levels, that includes a space. Setting the names argument does also not help. The only thing that helps is changing the levels of the factor variable.
A minimal example is attached.
Error_spaces.txt

"Too many" of the same value causes an error

Thanks for the awesome package. The following is the issue I am facing:

When giving the vioplot too many of the same value (and only that value) it produces an error.

These work fine:

> vioplot(x = c(0))
> vioplot(x = c(rep.int(0, 100)))

Adding one order of magnitude and it breaks:

vioplot(x = c(rep.int(0, 1000)))
vioplot(x = c(rep.int(1, 1000)))

The error I get is:

Error in cut.default(x, breaks = breaks) : 'breaks' are not unique

To clarify, if there is even one more number with another value, it will work as intended:

vioplot(x = c(rep.int(c(1), 10000), 2))

My session's info:

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Manjaro Linux

Matrix products: default
BLAS:   /usr/lib/libopenblasp-r0.3.12.so
LAPACK: /usr/lib/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=fi_FI.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=fi_FI.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fi_FI.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] vioplot_0.3.5   zoo_1.8-8       sm_2.2-5.6     
[4] beeswarm_0.2.3  varhandle_2.0.5 RCurl_1.98-1.2 
[7] XML_3.99-0.5   

loaded via a namespace (and not attached):
 [1] rstudioapi_0.13 knitr_1.30      magrittr_1.5   
 [4] hms_0.5.3       lattice_0.20-41 R6_2.5.0       
 [7] rlang_0.4.8     highr_0.8       tcltk_4.0.3    
[10] tools_4.0.3     grid_4.0.3      xfun_0.19      
[13] ellipsis_0.3.1  yaml_2.2.1      tibble_3.0.4   
[16] lifecycle_0.2.0 crayon_1.3.4    readr_1.4.0    
[19] vctrs_0.3.4     bitops_1.0-6    compiler_4.0.3 
[22] pillar_1.4.6    pkgconfig_2.0.3

`las` and `ylab` does not work

It seems the arguments las and ylab are not implemented in the package. I do not get any error/warning but they do not work either.

h parameter documented as "height" rather than "density smoothing"

I'm just reporting that as a fresh user of vioplot, I initially got confused by the way the h parameter of vioplot() is documented.

I was looking for the density bandwith tuning argument, but the doc says h is the "the height for the density estimator, if omit as explained in sm.density, h will be set to an optimum". I interpreted this as a tuning for the height of the density plot (which is in fact the width when the violin is drawn vertically), but this happens to be tuned by wec (however, to add to the confusion, the code of vioplot then computes in internal variable "hscale" based on wec...).

My proposition is to use a documentation of the h parameter based on the doc of sm.density ("a vector of length one, two or three, defining the smoothing parameter. A normal kernel function is used and h is its standard deviation. If this parameter is omitted, a normal optimal smoothing parameter is used."). The important keywords to appear for me are "density smoothing parameter".

References

Places where "height" is used:
https://github.com/TomKellyGenetics/vioplot/blob/master/R/vioplot.R#L15
https://github.com/TomKellyGenetics/vioplot/blob/master/man/vioplot.Rd#L181

Doc of vioplot (0.3.7):
https://www.rdocumentation.org/packages/vioplot/versions/0.3.7/topics/vioplot

Doc of sm.density:
https://www.rdocumentation.org/packages/sm/versions/2.2-5.4/topics/sm.density

Feature Request: histogram instead of density kernel

Thank you for a great package.

For some cases, it would be helpful to use a histogram rather than a kernel density. The ability to use R's hist function and breaks argument would be very easy to use. This functionality is also not currently available via ggplot.

Surprising behavior when passing xlim=c(1,2)

Somebody new to vioplot might pass an xlim argument (this this is available in many other plot functions; I did this in one of my plots).

vioplot(sample(44)) # works as expected

vioplot(sample(44), xlim=c(1, 2)) # surprising behavior

The xlim causes a second plot to appear, next the the sample violin plot.

Option to change the color of the center line

There is no option to change the color of the center line, running down the middle of the violin.

If you don't want to add another option, then perhaps make it the same as the border color?

col is not reused when a vector is passed

When col is a vector, its values are not reused (the online help says they are).

Looking at the code, col[i] should be col[1+(i-1)%length(col)]

The other color parameters have the same problem, e.g., border

yaxt = 'n' should only suppress plotting of the y-axis; currently, it suppresses both

I am using R version 4.0.4 (2021-02-15) and vioplot_0.3.7.

vioplot(outcome ~ intervention, data = d, xaxt = 'n') works as expected: intervention values are not marked on the x-axis, but outcome values are marked on the y-axis.

vioplot(outcome ~ intervention, data = d, xaxt = 'n', horizontal = TRUE) also does something understandable: intervention values are not marked on the (vertical) "x-axis", but outcome values are marked on the (horizontal) "y-axis." This is not what the current version of boxplot does, but it's possible to work around that.

But both vioplot(outcome ~ intervention, data = d, yaxt = 'n') and vioplot(outcome ~ intervention, data = d, yaxt = 'n', horizontal = TRUE) result in neither intervention values nor outcome values being marked on their respective axes. This seems like it has to be wrong.

I took a look at the code, and this clearly is at least partially a reflection of the fact that all the if(xaxt !="n") conditionals in vioplot.R are wrapped inside if(yaxt !="n") conditionals. However, when I added else clauses for the latter, with similar code for handling the x-axes alone inside it, this made the two snippets above work as I expected them to, but still resulted in this code:

par(yaxt = 'n')
vioplot(outcome ~ intervention, data = d)

suppressing the plotting of both axes. I do not know why this is.

Add examples of overlaying base R graphics to docs

I've received several related questions which can be resolved by overlaying base R graphics and integrating vioplot with other plotting functions.

"vioplot()" functions similar to "plot()" and passes input arguments from "par()".

For example it is possible to add additional annotations. As requested by email here is an example:

# generate dummy data
a <- rnorm(25, 3, 0.5)
b <- rnorm(25, 2, 1.0)
c <- rnorm(25, 2.75, 0.25)
d <- rnorm(25, 3.15, 0.375)
e <- rnorm(25, 1, 0.25)
datamat <- cbind(a, b, c, d, e)
dim(datamat)
#> [1] 25  5
# violin plot
library("vioplot")
#> Loading required package: sm
#> Warning: package 'sm' was built under R version 4.1.3
#> Package 'sm', version 2.2-5.7: type help(sm) for summary information
#> Loading required package: zoo
#> Warning: package 'zoo' was built under R version 4.1.3
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
vioplot(datamat, ylim = c(0, 5))
# compute medians
data.med <- apply(datamat, 2, median)
data.med
#>         a         b         c         d         e 
#> 3.0475551 2.2365210 2.7504934 3.0730919 0.9803336
#overlay medians
lines(data.med, lty = 2, lwd = 1.5)
points(data.med, pch = 19, col = "red", cex = 2.25)

Created on 2022-11-07 with reprex v2.0.2</sup](https://reprex.tidyverse.org%29%3C/sup)>

It is also possible to modify the axes labels and titles as shown in issue #16

library("vioplot")
#> Loading required package: sm
#> Warning: package 'sm' was built under R version 4.1.3
#> Package 'sm', version 2.2-5.7: type help(sm) for summary information
#> Loading required package: zoo
#> Warning: package 'zoo' was built under R version 4.1.3
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
outcome <- c(rnorm(25, 3, 1), rnorm(25, 2, 0.5))
intervention <- c(rep("treatment", 25), rep("control", 25))
table(intervention)
#> intervention
#>   control treatment 
#>        25        25
names(table(intervention))
#> [1] "control"   "treatment"
unique(sort(intervention))
#> [1] "control"   "treatment"
intervention <- as.factor(intervention)
levels(intervention)
#> [1] "control"   "treatment"
d <- data.frame(outcome, intervention)
vioplot(outcome ~ intervention, data = d, xaxt = 'n', yaxt = 'n', 
        main = "", xlab = "", ylab = "")
axis(side = 1, at = 1:length(levels(intervention)), labels = levels(intervention))
mtext("custom x labels for intervention", side = 1)
mtext("custom y labels for outcome", side = 2)
title(main = "example with custom title", sub = "subtitles are supported")

Created on 2022-11-09 with reprex v2.0.2

I'll share these examples publicly for future reference. This issue indicates these are under consideration in updated documentation or vignettes.

Titles axes and position of legend in violin plots

  1. The titles or names of the axis are not properly overwritten when I use the code
    title(xlab = "Species in Iris table", ylab = "Sepal Length in iris table"), but the original names of the axes are still there! What can I do to have them properly removed.

  2. Sometimes it would be very helpful to have the legend OUTSIDE the actual plotting area, e.g. under the x-axis or right to the plotting area. I see only options where it is inside it. If you have other data than the example from iris, the legend can hide (part of) the violins, which is not nice. In the example I put the legend on the left bottom to show what I mean. You don't see the first violin now.

How to have the legend OUTSIDE the polling area?

Here is the code that produced the plot below.:
`
library(vioplot)

iris_large <- iris[iris$Sepal.Width > mean(iris$Sepal.Width), ]
iris_small <- iris[iris$Sepal.Width <= mean(iris$Sepal.Width), ]

vioplot(Sepal.Length~Species, data=iris_large,
col = "palevioletred",
plotCentre = "line", side = "right")

vioplot(Sepal.Length~Species, data=iris_small,
col = "lightblue",
plotCentre = "line", side = "left", add = T)

title(xlab = "Species in Iris table", ylab = "Sepal Length in iris table")

legend("bottomleft", fill = c("lightblue", "palevioletred"),
legend = c("small according to iris table", "large according to iris table"),
title = "Sepal Width as classified in iris table")
`

image

Recommendations

To compare multiple groups of histogram densities, it helps to adjust the wex. Here is a MWE that does so, which you may want to add to a vignette.

dlist1 <- lapply(c(10,20,30,40), function(n) runif(n))
dlist2 <- lapply(c(100,200,300,400), function(n) runif(n))

hscale1 <- sapply(dlist1, function(r){
    max(hist(r, plot=F, breaks=seq(0,1,by=.05))$density)})
histoplot(dlist1, side='left', col=grey(.3),
    breaks=seq(0,1,by=.05), add=F, pchMed=NA, drawRect=F, border=NA,
    wex=hscale1/length(hscale1))

hscale2 <- sapply(dlist2, function(r){
    max(hist(r, plot=F, breaks=seq(0,1,by=.05))$density)})
histoplot(dlist2, side='right', col=grey(.7),
    breaks=seq(0,1,by=.05), add=T, pchMed=NA, drawRect=F, border=NA,
    wex=hscale2/length(hscale2))

Sometimes, it is helpful to see the raw counts instead.

dvec <- length(unlist(c(dlist1, dlist2)))/4

histoplot(dlist1, side='left', col=grey(.3),
    breaks=seq(0,1,by=.05), add=F, pchMed=NA, drawRect=F, border=NA,
    wex=sapply(dlist1, length)/dvec*hscale1/length(hscale1))
histoplot(dlist2, side='right', col=grey(.7),
    breaks=seq(0,1,by=.05), add=T, pchMed=NA, drawRect=F, border=NA,
    wex=sapply(dlist2, length)/dvec*hscale2/length(hscale2))

It may also benefit some users to pass density and angle arguments to the histograms (ultimately rect) and create outer legends

hist(runif(100), density=c(10,20), angle=c(22,90+22) ,col=1)

outer_legend <- function(...) {
  opar <- par(fig=c(0, 1, 0, 1), oma=c(0, 0, 0, 0), mar=c(0, 0, 0, 0), new=T)
  on.exit(par(opar))
  plot(0, 0, type='n', bty='n', xaxt='n', yaxt='n')
  legend(...)
}
outer_legend('topright', pch=15, density=c(10,20), angle=c(22,90+22), col=0, legend=c('Y','N'))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.