emilhvitfeldt / ggpage Goto Github PK

View Code? Open in Web Editor NEW

331.0 15.0 12.0 16.93 MB

Creates Page Layout Visualizations in R 📄📄📄

Home Page: https://emilhvitfeldt.github.io/ggpage/

License: Other

R 100.00%

ggplot2 r rstats datavisualization dataviz data-visualization

ggpage's Introduction

ggpage

ggpage is a package to create pagestyled visualizations of text based data. It uses ggplot2 and final returns are ggplot2 objects.

Version 0.2.0

In this new version I have worked to include a lot of use cases that wasn’t available in the first version. These new elements are previewed in the vignette.

Installation

You can install the released version of ggpage from CRAN with:

install.packages("ggpage")

or you can install the developmental version of ggpage from github with:

# install.packages("devtools")
devtools::install_github("EmilHvitfeldt/ggpage")

Example

The package includes The Tinder-box by H.C. Andersen for examples.

library(tidyverse)
#> Warning: replacing previous import 'dplyr::vars' by 'rlang::vars' when
#> loading 'dbplyr'
library(ggpage)

head(tinderbox, 10)
#> # A tibble: 10 x 2
#>    text                                                        book        
#>    <chr>                                                       <chr>       
#>  1 "A soldier came marching along the high road: \"Left, righ… The tinder-…
#>  2 had his knapsack on his back, and a sword at his side; he … The tinder-…
#>  3 and was now returning home. As he walked on, he met a very… The tinder-…
#>  4 witch in the road. Her under-lip hung quite down on her br… The tinder-…
#>  5 "and said, \"Good evening, soldier; you have a very fine s… The tinder-…
#>  6 knapsack, and you are a real soldier; so you shall have as… The tinder-…
#>  7 "you like.\""                                               The tinder-…
#>  8 "\"Thank you, old witch,\" said the soldier."               The tinder-…
#>  9 "\"Do you see that large tree,\" said the witch, pointing … The tinder-…
#> 10 "beside them. \"Well, it is quite hollow inside, and you m… The tinder-…

The basic workflow with ggpage is using either

ggpage_quick for a quick one function call plot or,
combining ggpage_build and ggpage_plot to do analysis (NLP for example) before the final plot is produced.

For a simple demonstration we apply ggpage_quick to our tinderbox object. It is important that the data.frame that is used have the text in a column named “text”.

ggpage_quick(tinderbox)
#> Warning: replacing previous import 'dplyr::vars' by 'rlang::vars' when
#> loading 'tidytext'

# Also pipeable
# tinderbox %>% ggpage_quick()

The same result would be achieved by using

tinderbox %>% 
  ggpage_build() %>% 
  ggpage_plot()

But this approach allows us to introduce more code between ggpage_build and ggpage_plot giving us multiple more ways to enhance the plots

tinderbox %>%
  ggpage_build() %>%
  mutate(long_word = stringr::str_length(word) > 8) %>%
  ggpage_plot(aes(fill = long_word)) +
  labs(title = "Longer words throughout The Tinder-box") +
  scale_fill_manual(values = c("grey70", "blue"),
                    labels = c("8 or less", "9 or more"),
                    name = "Word length")

ggpage's People

Contributors

Stargazers

Watchers

Forkers

petrichorcode skynode zhaoxiaohe benjamesbabala muntasirmasum jmpasmoi baifengbai juansokil tuqmano batpigandme js-god cyuhat

ggpage's Issues

Small mistake in one of the sample code (sentiment)

Old code:

[library](https://www.rdocumentation.org/packages/base/topics/library)(paletteer)
sentiment_types <- sentiments %>%
  [filter](https://www.rdocumentation.org/packages/stats/topics/filter)(lexicon == "nrc") %>%
  pull(sentiment) %>%
  [unique](https://www.rdocumentation.org/packages/base/topics/unique)()

prebuild <- [imap_dfr](https://purrr.tidyverse.org/reference/imap.html)(sentiment_types,
  ~ [ggpage_build](https://emilhvitfeldt.github.io/ggpage/reference/ggpage_build.html)(tinderbox) %>%
  left_join([filter](https://www.rdocumentation.org/packages/stats/topics/filter)([get_sentiments](https://www.rdocumentation.org/packages/tidytext/topics/get_sentiments)("nrc"), sentiment == .x), by = "word") %>%
    mutate(sentiment_state = .x,
           score = [as.numeric](https://www.rdocumentation.org/packages/base/topics/numeric)(![is.na](https://www.rdocumentation.org/packages/base/topics/NA)(sentiment)),
           score_smooth = zoo::[rollmean](https://www.rdocumentation.org/packages/zoo/topics/rollmean)(score, 5, 0)))

prebuild %>% 
  [ggpage_plot](https://emilhvitfeldt.github.io/ggpage/reference/ggpage_plot.html)(aes(fill = score_smooth), page.number = "top-left") +
  paletteer::[scale_fill_paletteer_c](https://www.rdocumentation.org/packages/paletteer/topics/ggplot2-scales-continuous)(viridis, cividis, direction = -1) +
  guides(fill = "none") +
  [transition_states](https://www.rdocumentation.org/packages/gganimate/topics/transition_states)(
    sentiment_state,
    transition_length = 10,
    state_length = 3
    ) +
  labs(title = "Sections with a sentiment of {closest_state}\nIn H.C. Andersen's Tinderbox")

Updated code:

library(paletteer)
sentiment_types <- get_sentiments("nrc") %>% 
    pull(sentiment) %>%
    unique()

prebuild <- imap_dfr(sentiment_types,
                     ~ ggpage_build(tinderbox) %>%
                         left_join(filter(get_sentiments("nrc"), sentiment == .x), by = "word") %>%
                         mutate(sentiment_state = .x,
                                score = as.numeric(!is.na(sentiment)),
                                score_smooth = zoo::rollmean(score, 5, 0)))

prebuild %>% 
    ggpage_plot(aes(fill = score_smooth), page.number = "top-left") +
    paletteer::scale_fill_paletteer_c("viridis::viridis", "viridis::cividis", direction = -1) +
    guides(fill = "none") +
    transition_states(
        sentiment_state,
        transition_length = 10,
        state_length = 3
    ) +
    labs(title = "Sections with a sentiment of {closest_state}\nIn H.C. Andersen's Tinderbox")

changes to sentiment lexicon

Hello! 👋

I am going through tidytext's reverse dependencies and discovered that ggpage uses the NRC lexicon in sentiment analysis. As you are aware, we've done a re-examination of the decision to include so many sentiment lexicons within tidytext in juliasilge/tidytext#131, because of how the lexicons have various licenses and tidytext has a different license, and the NRC lexicon is no longer going to be included in the package. 😕

This new version of tidytext is going to be submitted to CRAN in about a week or so, and after that the Bing lexicon will be the only one available directly in tidytext. I want to a) let you know so you can work on updates and b) apologize for how this chain of decisions impacts you! Let me know if you have questions or if I can help.

word_to_line(data.frame(text = LETTERS), wot_number = 10)

                  1                   2                   3 
"A D G J M P S V Y" "B E H K N Q T W Z"   "C F I L O R U X"

I think I see where the issue and and will submit a pull request to fix it.

Empty lines doesn't work if the data ends with a empty line

library(ggpage)
library(tidytext)
library(tidyverse)

text <- "Modeling as a statistical practice can encompass a wide variety of activities. 
This book focuses on supervised or predictive modeling for text, using text data 
to make predictions about the world around us. We use the tidymodels framework 
for modeling, a consistent and flexible collection of R packages developed to 
encourage good statistical practice.

Supervised machine learning using text data involves building a statistical 
model to estimate some output from input that includes language. The two types 
of models we train in this book are regression and classification. Think of 
regression models as predicting numeric or continuous outputs, such as 
predicting the year of a United States Supreme Court opinion from the text of 
that opinion. Think of classification models as predicting outputs that are 
discrete quantities or class labels, such as predicting whether a GitHub issue 
is about documentation or not from the text of the issue. Models like these can
be used to make predictions for new observations, to understand what features 
or characteristics contribute to differences in the output, and more. We can 
evaluate our models using performance metrics to determine which are best, which 
are acceptable for our specific context, and even which are fair."

tibble(text = text) %>%
  unnest_tokens(text, text, token = function(x) str_split(x, "\n")) %>%
  ggpage_quick()
#> Warning: Use of `data_1$x_space_right` is discouraged. Use `x_space_right`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$x_space_left` is discouraged. Use `x_space_left`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.

text <- "Modeling as a statistical practice can encompass a wide variety of activities. 
This book focuses on supervised or predictive modeling for text, using text data 
to make predictions about the world around us. We use the tidymodels framework 
for modeling, a consistent and flexible collection of R packages developed to 
encourage good statistical practice.

Supervised machine learning using text data involves building a statistical 
model to estimate some output from input that includes language. The two types 
of models we train in this book are regression and classification. Think of 
regression models as predicting numeric or continuous outputs, such as 
predicting the year of a United States Supreme Court opinion from the text of 
that opinion. Think of classification models as predicting outputs that are 
discrete quantities or class labels, such as predicting whether a GitHub issue 
is about documentation or not from the text of the issue. Models like these can
be used to make predictions for new observations, to understand what features 
or characteristics contribute to differences in the output, and more. We can 
evaluate our models using performance metrics to determine which are best, which 
are acceptable for our specific context, and even which are fair.
"

tibble(text = text) %>%
  unnest_tokens(text, text, token = function(x) str_split(x, "\n")) %>%
  ggpage_quick()
#> Warning: Use of `data_1$x_space_right` is discouraged. Use `x_space_right`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$x_space_left` is discouraged. Use `x_space_left`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.

^{Created on 2021-04-18 by the reprex package (v1.0.0)}