Giter VIP home page Giter VIP logo

gghdr's Introduction

gghdr

Lifecycle: experimental R build status Codecov test coverage

Package gghdr helps to visualize Highest Density Regions (HDR) in one and two dimensions. HDRs are useful in displaying multimodality in the distribution. This work draws inspiration from the the package hdrcde developed by Rob Hyndman and provide a framework for displaying HDRs under ggplot2 framework.

Installation

You could install the development version from Github using:

# install.packages("remotes")
remotes::install_github("ropenscilabs/gghdr")

An overview of gghdr

There are several statistical methods to summarize a distribution by region of the sample space covering certain probability. For example, in a traditional boxplot, the central box bounded by the interquartile range represents 50% coverage and whiskers represents 99% coverage for large samples. The method of summarizing a distribution using highest density regions are useful for analyzing multimodal distributions. We illustrate this by exploring the data set faithful which contains the waiting time and duration of eruptions for the old faithful geyser in the Yellowstone National Park, USA.

library(ggplot2)
ggplot(faithful, aes(y=eruptions)) + geom_boxplot()

We can use geom_hdr_boxplot to display the same variable. Along with displaying the 99% and 50% highest density regions, it also shows the local mode in each of the regions. This shows that eruption times are likely to be aroud 4.5 minutes or 2 minutes but rarely for around 3 minutes. This insight was not apparent in the above boxplot.

library(gghdr)
library(ggplot2)
ggplot(faithful, aes(y = eruptions)) + 
  geom_hdr_boxplot(prob = c(.5, 0.99), fill = "blue") + 
  theme_minimal()

It can be interesting to supplement a scatterplot with marginal distributions of one or both variables to enhance insights on the relationship between the two variables. This is possible through geom_hdr_rug. This points in the figure shows two clear clusters, one with shorter waiting times and shorter eruptions (around 2 minutes) and another with longer waiting times and longer eruptions (around 4.5 minutes). The geom_hdr_rug adds to this information by displaying the highest density region of eruption time covering 50% and 99%.

ggplot(faithful) +
  geom_point(aes(x = eruptions, y = waiting)) + 
  geom_hdr_rug(aes(x = eruptions), prob = c(0.99, 0.5), fill = "blue")

The previous example can be extended to allow displaying the scatterplot with points coloured according to the bivariate highest density regions using hdr_bin.hdr_bin can also be mapped to only x-axis or y-axis to show the marginal distribution of any one variable. This figure enrich the information of scatterplot by emphasizing the highest bivariate density regions covering 50%, 90%, 99% and more than 99% coverage.

ggplot(data = faithful, aes(x = waiting, y=eruptions)) +
  geom_point(aes(colour = hdr_bin(x = waiting, y = eruptions))) +
  scale_colour_viridis_d(direction = -1) 

You can read more about gghdr in the [vignette].

ropensci_footer

gghdr's People

Contributors

dvanic avatar emitanaka avatar ffancheng avatar mitchelloharawild avatar ryo-n7 avatar sayani07 avatar sparce avatar thomas-fung avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.