Giter VIP home page Giter VIP logo

slopegraph's Introduction

Tufte-Inspired Slopegraphs in R

This repository holds some working code for creating "slopegraphs" in R.

This is very much a work in progress. Once it's more stable, I will release the package to CRAN.

Pull requests welcome. Please report any issues on the issues page.

The package currently includes one mainfunction, slopegraph(), which produces a slopegraph from an observation-by-period data frame. Everything is more or less drawn automatically, but is highly customizable in terms of line and text colors, font sizes and styles, axes, titles, and plotting behind and in front of the slopegraph lines. An underlying function, segmentize() produces the data structure used for the actual plotting. And a new function, ggslopegraph() does the same as slopegraph() but using ggplot2 graphics.

Examples

The current output of the slopegraph() function (for the examples included in documentation) are shown below.

Tufte's most famous slopegraph example is probably the "cancer survival graph," depicting 5, 10, 15, and 20 year survival rates for various cancers. The first example mimics this result but draws it to the correct scale (unlike Tufte's original):

library("slopegraph")
data(cancer)
slopegraph(cancer, col.lines = 'gray', col.lab = "black", 
           xlim = c(-.5,5.5), cex.lab = 0.5, cex.num = 0.5,
           xlabels = c('5 Year','10 Year','15 Year','20 Year'))

Cancer Survival

The second example, also from Tufte, shows changes in gross domestic product for a small set of countries over two points in time:

data(gdp)
slopegraph(gdp, col.lines = 'gray', col.lab = "black", xlabels = c('1970','1979'),  
           main = 'Current Receipts of Goverment as a Percentage of Gross Domestic Product')

GDP

This third example comes from an 1878 publication (a copy of which is available here), showing the relative ranking of the population of various U.S. states. This example features a reversed y-axis to better display the ranking and I demonstrate the col.lines argument to highlight South Carolina:

data(states)
cols <- `[<-`(rep("black", 37), 7, "red")
slopegraph(states, xlim = c(-1, 12), ylim = c(37,0), offset.x = 0.06,
           col.lines = cols, col.lab = cols, 
           main = 'Relative Rank of U.S. State Populations, 1790-1870')

states

As of v0.1.9, there is also a ggplot2-based function, ggslopegraph() that produces a similar representation but using ggplot2 graphics:

require("ggplot2")
## Loading required package: ggplot2
data(states)
cols <- `[<-`(rep("black", 37), 7, "red")
ggslopegraph(states, offset.x = 0.06, yrev = TRUE,
  col.lines = cols, col.lab = cols, 
  main = 'Relative Rank of U.S. State Populations, 1790-1870') +
 theme_bw()    
## Warning: Removed 84 rows containing missing values (geom_text).

ggstates

Installation

CRAN Build Status Build status codecov.io Project Status: Work in Progress

To install the latest development version of slopegraph from GitHub:

if (!require("remotes")) {
    install.packages("remotes")
}
remotes::install_github("leeper/slopegraph")

slopegraph's People

Contributors

austinschwartz avatar geekonacid avatar ibecav avatar jorane avatar leeper avatar martindaniel4 avatar mps9506 avatar pbradl42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

slopegraph's Issues

Citation and removing box frame of L1, L2, ... numeric values

Dear Thomas,

Thank you for the amazing package!
Would you inform me about the correct citation of the slopegraph package?
Using ggslopegraph2, box frame around numeric values is default. I would like to ask your help to remove box frame around L1, L2, .... numeric values.

Best wishes, Zsolt

Only the top line showing

In executing the GDP or the cancer data example, I am able to get only the top line. I know this was working fine last year but while updating the code, I realized that it is not reproducing. My environment is given below.

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] slopegraph_0.1.7

loaded via a namespace (and not attached):
[1] tools_3.3.1

Display labels by fewer decimals

I'd like to display slopegraphs with points with only a single decimal, so I use decimal=1 in my specification. But if I do, I get overlap of labels (eg VT and CT below) which I wouldn't get if used decimal=2. But using the latter is so untidy! Is there any way to get the benefit of the latter without displaying more than one significant digit?

One Significant Digit

ne polarization slopegraph

Two Significant Digits

slopegraph label two decimals

Handle missing data points better

Great package! But I'm still having difficulty with missing values. I'm attaching some data on state polarization for the northeast. Everything's there except the 2014 value for MA. This messes up the plot. I'd like to have the 2008 point for MA go directly to 2014. But as it is, MA gets orphaned after 2008, and the point for 2017 (the end) is not connected to anything.

na.span should do the job but it doesn't.

This is the code I use:

rownames(sg)=sg$st; sg$st=NULL
colnames(sg)=str_c("Year.",colnames(sg))
    
slopegraph(sg, col.line='gray',col.lab = 'black', decimals=1,
     xlabels=c('1996','2000','2004','2008','2014','2017'), 
     cex.lab = 1, cex.num=0.75)

ne polarization slopegraph
slopegraph problem 042117.zip

Matching label and line colours and consistent decimals

Thanks for all your work on this so far.

A couple of things.

  1. Could the label colours match the line colours?
  2. The number of decimal places is not consistent between columns.

The code I'm using is below and the resulting slopegraph is at this link. This shows the issues.

https://dl.dropboxusercontent.com/u/10963448/slope_graph.jpeg

library("RColorBrewer")
capex <- matrix(c(
  700348, 550203, 504668, 262529, 351732, # Perth
  1355928, 942090, 735799, 609752, 686136, # Melbourne
  805693, 792228, 713762, 629305, 641685, # Sydney
  504764, 989579, 653563, 517648, 487636, # South East Queensland
  256668, 230838, 143365, 59393, 48937, # Canberra
  595851, 530075, 331038, 187945, 152124, # Adelaide
  51978, 58080, 64789, 25600, NA), # Darwin
  nrow = 7, byrow = TRUE)


capex <-  capex/1000
capex <-  data.frame(capex)
cities <- c('Perth', 'Melbourne', 'Sydney','SE Queensland', 'Canberra', 'Adelaide', 'Darwin')
row.names(capex) <- cities
names(capex) <- c('X2010-11', 'X2011-12', 'X2012-13', 'X2013-14', 'X2014-15')

my_col = brewer.pal(9, name = 'Paired')
par(oma = c(1,4,1,1))

slopegraph(capex, 
           labels =  c('2010-11', '2011-12', '2012-13', '2013-14', '2014-15'),
           decimals = 0,
           col.lines = my_col,
           col.lab = my_col,
           family = 'sans')

svglite and rsvg offer new potential

The new tools svglite and rsvg provide us with some new potential for this package. Fortunately, I think it requires no changes to the existing slopegraph code. I'll work up some examples to demonstrate the new potential.

Thanks so much for this great package.

col.lines[i] not working

col.lines is not working correcting in the function 'slopegraph'. It can be fixed easily, by adding 'n' to the cbind command that creates 'todraw' (line155), thus passing the index of the line in question to the apply function on line 157. Then add the line ' i <- rowdata[5]' to the apply function (line 162, e.g.) and the lines will color as expected.

Problem when saving as PDF

Bug report via email:

Here's a data frame of state polarization by 4 year intervals. When I enter the following interactively, I get the slopegraph in the Rstudio plot window.

slopegraph(na.omit(sg), col.line='gray',decimals=1,labels=c('1996','2000','2004','2008','2014'), binval=1.5)

BTW, the slopegraph doesn't work unless I omit NAs but this isn't quite what I want as I lose the entire set of observations if a state has any missing values. (see also #3)

The big question though is why I get a strange error when I set up the pdf:

pdf('Plots/Legislatures_2016/States/polarization_slopegraph.pdf', height=16, width=12, family='Palatino')
slopegraph(na.omit(sg), col.line='gray',decimals=1,labels=c('1996','2000','2004','2008','2014'), binval=1.5)

The error I get from just adding the pdf command at the top is:

Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘0.4’ 

I don't understand why row names would be set at one of the values? It should be state abbreviations!

inappropriate rounding messes up the plot, when data points are close to zero

For example, using the following dataset,

# A tibble: 14 x 3
   X__1            SE_withOPTIMAL `SE_without OPTIMAL`
 * <chr>                    <dbl>                <dbl>
 1 HR_male                 0.0340               0.0357
 2 HR_age                  0.0186               0.0211
 3 HR_nowmsk               0.0408               0.0450
 4 HR_oxygen               0.0457               0.0439
 5 HR_fev1pre              0.0416               0.0412
 6 HR_statin               0.0433               0.0413
 7 HR_azithromycin         0.0387               0.0359
 8 HR_LAMA                 0.0511               0.0456
 9 HR_LABA                 0.0480               0.0504
10 HR_ICS                  0.0507               0.0756
11 HR_sgrq                 0.0130               0.0132
12 HR_BMI10                0.0268               0.0290
13 HR_OPTIMAL              0.0512               0.307

will result in:
image

ggslopegraph2 problem: not a data frame

ggslopegraph2 mostly works, except when it is called in a function. Then it complains about a dataframe I pass to it, complaining:

Error: The first object in your list 'sg.df' does not exist. It should be a dataframe

My code is as follows:

    ggslopegraph2(sg.df, times=year, measurement=comp.diffs, grouping=st, title = NULL) +  
      theme_bw() + labs(x=NULL, y=NULL) + theme(legend.position = "none")

In the debugger I have verified that sg.df is absolutely a dataframe. What could be going on?

Plot.new() needed for ggslopegraph?

I'm using ggslopegraph() exclusively because of how awesome it looks and how great ggplot2 is. But when I call ggslopegraph() initially I get the following error:

Error in strwidth(sprintf(fmt, long[["value"]])) : 
  plot.new has not been called yet

While invoking plot.new() works, this should not be necessary, right? I've never needed to call it for any other ggplot2 methods.

Not able to load package

When I run this:

if (!require("ghit")) {
install.packages("ghit")
}
ghit::install_github("leeper/slopegraph")

I get this error:

Error in read.dcf(file = tmpf) : cannot open the connection
In addition: Warning message:
In read.dcf(file = tmpf) :
cannot open compressed file '//var/folders/d6/vj5j60497bs_kc9z28wc3j6c0000gn/T//RtmpeVYawE/ghitdrat/src/contrib/PACKAGES', probable reason 'No such file or directory'

rownames don't show in example codes

Examples fail to produce rownames as shown in the example screenshots (R v3.4.3 in Windows)

> slopegraph(cancer, col.lines = 'gray', col.lab = "black", 
+            xlim = c(-.5,5.5), cex.lab = 0.5, cex.num = 0.5,
+            xlabels = c('5 Year','10 Year','15 Year','20 Year'))

image

Thanks and some thoughts

I didn't do a pull request because it seems you all are in a bit of flux, but I did want to give you proper credit and a look at the slightly different way I approached the task I took a look at your base code and there's probably an opportunity to merge some pieces here.

https://ibecav.github.io/slopegraph/

Thanks for your efforts.

Chuck

Fix vertically overlapping value and row labels

The current binning algorithm simply puts labels on new lines when they're too close. A better version would evenly space the numbers within the range of overlap. It will have the same even appearance, but probably line up better with slope lines.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.