Giter VIP home page Giter VIP logo

ggpage's Introduction

ggpage

Travis build status AppVeyor build status Coverage status CRAN status

ggpage is a package to create pagestyled visualizations of text based data. It uses ggplot2 and final returns are ggplot2 objects.

Version 0.2.0

In this new version I have worked to include a lot of use cases that wasn’t available in the first version. These new elements are previewed in the vignette.

Installation

You can install the released version of ggpage from CRAN with:

install.packages("ggpage")

or you can install the developmental version of ggpage from github with:

# install.packages("devtools")
devtools::install_github("EmilHvitfeldt/ggpage")

Example

The package includes The Tinder-box by H.C. Andersen for examples.

library(tidyverse)
#> Warning: replacing previous import 'dplyr::vars' by 'rlang::vars' when
#> loading 'dbplyr'
library(ggpage)

head(tinderbox, 10)
#> # A tibble: 10 x 2
#>    text                                                        book        
#>    <chr>                                                       <chr>       
#>  1 "A soldier came marching along the high road: \"Left, righ… The tinder-…
#>  2 had his knapsack on his back, and a sword at his side; he … The tinder-…
#>  3 and was now returning home. As he walked on, he met a very… The tinder-…
#>  4 witch in the road. Her under-lip hung quite down on her br… The tinder-…
#>  5 "and said, \"Good evening, soldier; you have a very fine s… The tinder-…
#>  6 knapsack, and you are a real soldier; so you shall have as… The tinder-…
#>  7 "you like.\""                                               The tinder-…
#>  8 "\"Thank you, old witch,\" said the soldier."               The tinder-…
#>  9 "\"Do you see that large tree,\" said the witch, pointing … The tinder-…
#> 10 "beside them. \"Well, it is quite hollow inside, and you m… The tinder-…

The basic workflow with ggpage is using either

  • ggpage_quick for a quick one function call plot or,
  • combining ggpage_build and ggpage_plot to do analysis (NLP for example) before the final plot is produced.

For a simple demonstration we apply ggpage_quick to our tinderbox object. It is important that the data.frame that is used have the text in a column named “text”.

ggpage_quick(tinderbox)
#> Warning: replacing previous import 'dplyr::vars' by 'rlang::vars' when
#> loading 'tidytext'

# Also pipeable
# tinderbox %>% ggpage_quick()

The same result would be achieved by using

tinderbox %>% 
  ggpage_build() %>% 
  ggpage_plot()

But this approach allows us to introduce more code between ggpage_build and ggpage_plot giving us multiple more ways to enhance the plots

tinderbox %>%
  ggpage_build() %>%
  mutate(long_word = stringr::str_length(word) > 8) %>%
  ggpage_plot(aes(fill = long_word)) +
  labs(title = "Longer words throughout The Tinder-box") +
  scale_fill_manual(values = c("grey70", "blue"),
                    labels = c("8 or less", "9 or more"),
                    name = "Word length")

ggpage's People

Contributors

batpigandme avatar benmarwick avatar emilhvitfeldt avatar flother avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ggpage's Issues

Small mistake in one of the sample code (sentiment)

Old code:

[library](https://www.rdocumentation.org/packages/base/topics/library)(paletteer)
sentiment_types <- sentiments %>%
  [filter](https://www.rdocumentation.org/packages/stats/topics/filter)(lexicon == "nrc") %>%
  pull(sentiment) %>%
  [unique](https://www.rdocumentation.org/packages/base/topics/unique)()

prebuild <- [imap_dfr](https://purrr.tidyverse.org/reference/imap.html)(sentiment_types,
  ~ [ggpage_build](https://emilhvitfeldt.github.io/ggpage/reference/ggpage_build.html)(tinderbox) %>%
  left_join([filter](https://www.rdocumentation.org/packages/stats/topics/filter)([get_sentiments](https://www.rdocumentation.org/packages/tidytext/topics/get_sentiments)("nrc"), sentiment == .x), by = "word") %>%
    mutate(sentiment_state = .x,
           score = [as.numeric](https://www.rdocumentation.org/packages/base/topics/numeric)(![is.na](https://www.rdocumentation.org/packages/base/topics/NA)(sentiment)),
           score_smooth = zoo::[rollmean](https://www.rdocumentation.org/packages/zoo/topics/rollmean)(score, 5, 0)))

prebuild %>% 
  [ggpage_plot](https://emilhvitfeldt.github.io/ggpage/reference/ggpage_plot.html)(aes(fill = score_smooth), page.number = "top-left") +
  paletteer::[scale_fill_paletteer_c](https://www.rdocumentation.org/packages/paletteer/topics/ggplot2-scales-continuous)(viridis, cividis, direction = -1) +
  guides(fill = "none") +
  [transition_states](https://www.rdocumentation.org/packages/gganimate/topics/transition_states)(
    sentiment_state,
    transition_length = 10,
    state_length = 3
    ) +
  labs(title = "Sections with a sentiment of {closest_state}\nIn H.C. Andersen's Tinderbox")

Updated code:

library(paletteer)
sentiment_types <- get_sentiments("nrc") %>% 
    pull(sentiment) %>%
    unique()

prebuild <- imap_dfr(sentiment_types,
                     ~ ggpage_build(tinderbox) %>%
                         left_join(filter(get_sentiments("nrc"), sentiment == .x), by = "word") %>%
                         mutate(sentiment_state = .x,
                                score = as.numeric(!is.na(sentiment)),
                                score_smooth = zoo::rollmean(score, 5, 0)))

prebuild %>% 
    ggpage_plot(aes(fill = score_smooth), page.number = "top-left") +
    paletteer::scale_fill_paletteer_c("viridis::viridis", "viridis::cividis", direction = -1) +
    guides(fill = "none") +
    transition_states(
        sentiment_state,
        transition_length = 10,
        state_length = 3
    ) +
    labs(title = "Sections with a sentiment of {closest_state}\nIn H.C. Andersen's Tinderbox")

changes to sentiment lexicon

Hello! 👋

I am going through tidytext's reverse dependencies and discovered that ggpage uses the NRC lexicon in sentiment analysis. As you are aware, we've done a re-examination of the decision to include so many sentiment lexicons within tidytext in juliasilge/tidytext#131, because of how the lexicons have various licenses and tidytext has a different license, and the NRC lexicon is no longer going to be included in the package. 😕

This new version of tidytext is going to be submitted to CRAN in about a week or so, and after that the Bing lexicon will be the only one available directly in tidytext. I want to a) let you know so you can work on updates and b) apologize for how this chain of decisions impacts you! Let me know if you have questions or if I can help.

Small mistake in one of the sample code (rolling average)

Incorrect:
midbuild <- map_df(.x = 0:50 * 10 + 1, ~ prebuild %>% mutate(score = ifelse(is.na(score), 0, score), score_smooth = zoo::rollmean(score, .x, 0), score_smooth = score_smooth / max(score_smooth), rolls = .x))

Correct:
midbuild <- map_df(.x = 0:50 * 10 + 1, ~ prebuild %>% mutate(score = ifelse(is.na(value), 0, value), score_smooth = zoo::rollmean(score, .x, 0), score_smooth = score_smooth / max(score_smooth), rolls = .x))

Issue with word_to_line

Currently, the word_to_line function does not seem to work as expected whenever
the wot_number exceeds the length of the input. For example, you would expect
this to list the letters in order, but instead they are shuffled across lines:

word_to_line(data.frame(text = LETTERS), wot_number = 10)
                  1                   2                   3 
"A D G J M P S V Y" "B E H K N Q T W Z"   "C F I L O R U X" 

I think I see where the issue and and will submit a pull request to fix it.

Empty lines doesn't work if the data ends with a empty line

library(ggpage)
library(tidytext)
library(tidyverse)

text <- "Modeling as a statistical practice can encompass a wide variety of activities. 
This book focuses on supervised or predictive modeling for text, using text data 
to make predictions about the world around us. We use the tidymodels framework 
for modeling, a consistent and flexible collection of R packages developed to 
encourage good statistical practice.

Supervised machine learning using text data involves building a statistical 
model to estimate some output from input that includes language. The two types 
of models we train in this book are regression and classification. Think of 
regression models as predicting numeric or continuous outputs, such as 
predicting the year of a United States Supreme Court opinion from the text of 
that opinion. Think of classification models as predicting outputs that are 
discrete quantities or class labels, such as predicting whether a GitHub issue 
is about documentation or not from the text of the issue. Models like these can
be used to make predictions for new observations, to understand what features 
or characteristics contribute to differences in the output, and more. We can 
evaluate our models using performance metrics to determine which are best, which 
are acceptable for our specific context, and even which are fair."

tibble(text = text) %>%
  unnest_tokens(text, text, token = function(x) str_split(x, "\n")) %>%
  ggpage_quick()
#> Warning: Use of `data_1$x_space_right` is discouraged. Use `x_space_right`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$x_space_left` is discouraged. Use `x_space_left`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.

text <- "Modeling as a statistical practice can encompass a wide variety of activities. 
This book focuses on supervised or predictive modeling for text, using text data 
to make predictions about the world around us. We use the tidymodels framework 
for modeling, a consistent and flexible collection of R packages developed to 
encourage good statistical practice.

Supervised machine learning using text data involves building a statistical 
model to estimate some output from input that includes language. The two types 
of models we train in this book are regression and classification. Think of 
regression models as predicting numeric or continuous outputs, such as 
predicting the year of a United States Supreme Court opinion from the text of 
that opinion. Think of classification models as predicting outputs that are 
discrete quantities or class labels, such as predicting whether a GitHub issue 
is about documentation or not from the text of the issue. Models like these can
be used to make predictions for new observations, to understand what features 
or characteristics contribute to differences in the output, and more. We can 
evaluate our models using performance metrics to determine which are best, which 
are acceptable for our specific context, and even which are fair.
"

tibble(text = text) %>%
  unnest_tokens(text, text, token = function(x) str_split(x, "\n")) %>%
  ggpage_quick()
#> Warning: Use of `data_1$x_space_right` is discouraged. Use `x_space_right`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$x_space_left` is discouraged. Use `x_space_left`
#> instead.
#> Warning: Use of `data_1$x_page` is discouraged. Use `x_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.
#> Warning: Use of `data_1$line` is discouraged. Use `line` instead.
#> Warning: Use of `data_1$y_page` is discouraged. Use `y_page` instead.

Created on 2021-04-18 by the reprex package (v1.0.0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.