lepennec / ggwordcloud Goto Github PK

View Code? Open in Web Editor NEW

172.0 172.0 8.0 13.45 MB

A word cloud geom for ggplot2

Home Page: https://lepennec.github.io/ggwordcloud/

License: GNU General Public License v3.0

R 78.73% C++ 21.27%

ggwordcloud's People

Contributors

Stargazers

Watchers

Forkers

makarevichy bravegag hoanganhngo610 westcoastjoe gaospecial xuri11 pawigor noriakis

ggwordcloud's Issues

Big words appearing on the outside?

In your example plots, it appears like the larger words all appear towards the centre of the image and the smaller words are on the periphery. When I try to use this package, I'm finding the opposite: that the larger words are around the periphery and the smaller words are at the centre.

is this something I can change? I'd like to see the larger words near the centre. I've tried relevelling the label factor a few different ways, but it doesn't seem to make a difference.

grouping words by a var in one single cloud?

Hi,

Is there a way to position words grouped by a variable, but in one single cloud?
Maybe a could with multiple points to gravity?

Running code from console and with reprex() produces different result when plotting wordcloud with mask?

Dear @lepennec and @espinielli,

I am currently trying to use ggwordcloud to plot wordclouds in some custom shapes.

Unfortunately, I'm encountering issues getting the mask argument of geom_text_wordcloud_area to work when running my code from the console and the wordcloud appears in default circle shape:

# devtools::install_github("lepennec/ggwordcloud")
library(ggwordcloud)

data("love_words")
set.seed(42)
ggplot(
  love_words,
  aes(
    label = word, size = speakers,
    color = speakers
  )
) +
  geom_text_wordcloud_area(aes(angle = 45 * sample(-2:2, nrow(love_words),
                                                   replace = TRUE,
                                                   prob = c(1, 1, 4, 1, 1)
  )),
  mask = png::readPNG(system.file("extdata/hearth.png",
                                  package = "ggwordcloud", mustWork = TRUE
  )),
  rm_outside = TRUE
  ) +
  scale_size_area(max_size = 40) +
  theme_minimal() +
  scale_color_gradient(low = "darkred", high = "red")

Strangely, the wordcloud appears to plot correctly using reprex():

library(ggplot2)
library(ggwordcloud)

data("love_words")
set.seed(42)
ggplot(
  love_words,
  aes(
    label = word, size = speakers,
    color = speakers
  )
) +
  geom_text_wordcloud_area(aes(angle = 45 * sample(-2:2, nrow(love_words),
                                                   replace = TRUE,
                                                   prob = c(1, 1, 4, 1, 1)
  )),
  mask = png::readPNG(system.file("extdata/hearth.png",
                                  package = "ggwordcloud", mustWork = TRUE
  )),
  rm_outside = TRUE
  ) +
  scale_size_area(max_size = 40) +
  theme_minimal() +
  scale_color_gradient(low = "darkred", high = "red")
#> Warning in wordcloud_boxes(data_points = points_valid_first, boxes = boxes, :
#> Some words could not fit on page. They have been removed.

^{Created on 2024-06-12 with reprex v2.1.0}

Obviously, I suspected some kind of graphical device issue on my Windows 10 machine and tried some troubleshooting by switching the graphics device in my RStudio 2024.04.2+764 as suggested in the following comment.

However, this didn't resolve my issue and I'd appreciate any feedback or pointers how to solve it.

Many thanks in advance!

My session.info() is:

R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8    LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] png_0.1-8          wordcloud2_0.2.1   wordcloud_2.6      RColorBrewer_1.1-3 lubridate_1.9.3    forcats_1.0.0      stringr_1.5.1     
 [8] dplyr_1.1.4        purrr_1.0.2        readr_2.1.5        tidyr_1.3.1        tibble_3.2.1       tidyverse_2.0.0    ggwordcloud_0.6.2 
[15] ggplot2_3.5.1     

loaded via a namespace (and not attached):
 [1] gtable_0.3.5      xfun_0.44         remotes_2.5.0     htmlwidgets_1.6.4 devtools_2.4.5    processx_3.8.4    callr_3.7.6      
 [8] tzdb_0.4.0        vctrs_0.6.5       tools_4.4.0       ps_1.7.6          generics_0.1.3    fansi_1.0.6       pkgconfig_2.0.3  
[15] readxl_1.4.3      lifecycle_1.0.4   compiler_4.4.0    farver_2.1.2      textshaping_0.4.0 munsell_0.5.1     httpuv_1.6.15    
[22] usethis_2.2.3     htmltools_0.5.8.1 yaml_2.3.8        urlchecker_1.0.1  later_1.3.2       pillar_1.9.0      ellipsis_0.3.2   
[29] cachem_1.1.0      sessioninfo_1.2.2 mime_0.12         commonmark_1.9.1  tidyselect_1.2.1  digest_0.6.35     stringi_1.8.4    
[36] labeling_0.4.3    fastmap_1.2.0     grid_4.4.0        colorspace_2.1-0  cli_3.6.2         magrittr_2.0.3    pkgbuild_1.4.4   
[43] utf8_1.2.4        clipr_0.8.0       withr_3.0.0       promises_1.3.0    scales_1.3.0      timechange_0.3.0  rmarkdown_2.27   
[50] cellranger_1.1.0  ragg_1.3.2        hms_1.1.3         shiny_1.8.1.1     memoise_2.0.1     evaluate_0.24.0   knitr_1.47       
[57] miniUI_0.1.1.1    markdown_1.13     profvis_0.3.8     rlang_1.1.4       gridtext_0.1.5    Rcpp_1.0.12       xtable_1.8-4     
[64] glue_1.7.0        xml2_1.3.6        pkgload_1.3.4     reprex_2.1.0      rstudioapi_0.16.0 R6_2.5.1          systemfonts_1.1.0
[71] fs_1.6.4

Too many warnings clutter example in Reference

The examples in geom_text_wordcloud generate a huge amount of warnings as seen online
https://lepennec.github.io/ggwordcloud/reference/geom_text_wordcloud.html

How can I tell user about fitting error in shiny app ?

Hello! Thank you for your great package.
I am having one problem.
I am currently working on a shiny application for wordcloud that allows the user to adjust the size of the plot as shown below.

runApp(shinyApp(
  ui = fluidPage(
    sliderInput("size","size",min=100,max=1000,step=100,value=300),
    plotOutput("cloud")
  ),
  server = function(input,output, session) {
    
    output$cloud = renderPlot(width=reactive(input$size),height=reactive(input$size),{
      data("love_words")
      ggplot(love_words,aes(label=word)) +
        geom_text_wordcloud(
          rm_outside = T
        )
    })
  }
))

When the plot size is small, the R studio console displays the message "Some words could not fit on page. They have been removed.",
is there any way to make this visible to Shiny app users as well?
I have tried combining renderUI and sink() myself, etc., but it didn't work.
In the wordcloud package, I can set rm_outside to FALSE to center the words that could not be fitted,
but I would like to explicitly notify the user about the fitting failure.

I apologize for the lack of clarity as I am not a native English speaker, but I would appreciate your help.
Thank you.

Put this message on Shiny app for users.

How to add text annotations to word cloud geom?

Hi @lepennec,

many thanks again for this great package!

I'd like to use ggwordcloud to plot a word cloud with a custom shape which I'd now like to annotate with custom text (e. g. using geom_textbox or annotate) as done in this blog post.

However, I'm currently not clear how the word cloud geom is placed in the coordinate system and how to properly add the text annotations at specific positions?

Many thanks in advance for your help!

Cannot display Chinese characters in Mac OS

I use this package with the "love" example and find that the results are both OK in Win10 and Ubuntu 16.04, however when I run the code in Mac OS, the Chinese characters cannot display correctly, they only show polygons. I also use wordcloud package and find the Chinese characters are OK, too. Even I set the text font family in theme function, it doesn't work. Can you tell me how to deal with it? Thank you!

NA displayed as legend key

Hi Erwan,

when answering this question on SO I stumbled over an issue which looks like a bug to me introduced with the new label_content aesthetic added in ggwordcloud 0.6.0.

When adding a legend to a word cloud the value of label_content (which defaults to NA) is displayed as the legend key glyph instead of the default "a" one would expect from ggplot2::draw_key_text. Here is a reprex of the issue:

library(ggwordcloud)
#> Loading required package: ggplot2

set.seed(42)
data("love_words_latin_small")

p <- ggplot(love_words_latin_small, aes(label = word, size = speakers)) +
  geom_text_wordcloud(show.legend = TRUE) +
  scale_size_area(max_size = 20) +
  theme_minimal()

p

By default ggplot2::draw_key_text sets a letter "a" as the default key glyph by checking the condition:

 if(is.null(data$label)) data$label <- "a"

However, because of the new label_content aes, the data passed from geom_text_word_cloud to draw_key_text now contains a column named label_content. Hence, because $ does partial matching the check is.null(data$label) returns FALSE even if there is actually no label column (whereas is.null(data[["label"]]) would return TRUE) and more importantly for the same reason the value from the label_content column (which by default is NA) is used as the label for the legend key.

A workaround would be to set a value for label or label_content via the override.aes argument of guide_legend:

p +
  guides(size = guide_legend(override.aes = list(label_content = "a")))

Chinese Character cannot be shown

The Chinese Character cannot be shown in the example, I have tried the following:

ggplot(love_words_small, aes(label = word)) +
geom_text_wordcloud() +
theme_minimal(base_family = "STKaiti")

Still not work

`facet_wrap(scales='free')` is ignored

Factors that are used for facetting aren't always balanced.
A facet for a rare category will contain very small wordcloud words. I think it would be nice if would rescale on a relative basis.

Repeat words to fill up shape mask?

Hi @lepennec,

thanks again for your great support! I really appreciate it.

One thing that I'm currently thinking about is how to repeat words to fill up a word cloud to fit a custom shape mask.

Such a feature is available in the free online word cloud generator at https://www.wordclouds.com/ but I'm not sure how and if this is possible with ggwordcloud.

Maybe you have any good ideas how to go about this?

Many thanks in advance!

Mask plots not reproducible

Hello, I was playing around with your package and came across a problem when trying to use a mask. At first I thought it was something with my image but I tried re-producing the example in the vignette using the languages dataset and the heart shape. I basically copy-pasted the code and it just results in blank space. I tried waiting but even after about 20 minutes it was just blank square. Everything else in the vignette works - shapes (start, triangle, etc.), faceting, and coloring. Do you happen to have an idea what could be causing the mask issues? I saw another mask issue that was closed last year but could not find any details or explanations in the thread.
Provided at the end is the output of my SessionInfo().
Thanks for the amazing package!

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] png_0.1-7 stringr_1.4.0 lubridate_1.7.4 rio_0.5.16 dplyr_0.8.3
[6] wordcloud2_0.2.2 hwordcloud_0.1.0 ggwordcloud_0.5.0.9000 ggplot2_3.2.1

loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 remotes_2.1.1 purrr_0.3.3 haven_2.2.0 vctrs_0.2.0 colorspace_1.4-1
[7] testthat_2.3.2 usethis_1.6.1 htmltools_0.4.0 yaml_2.2.0 rlang_0.4.6 pkgbuild_1.0.6
[13] pillar_1.4.2 foreign_0.8-72 glue_1.4.1 withr_2.1.2 readxl_1.3.1 sessioninfo_1.1.1
[19] cellranger_1.1.0 munsell_0.5.0 gtable_0.3.0 zip_2.0.4 devtools_2.3.0 htmlwidgets_1.5.1
[25] memoise_1.1.0 forcats_0.4.0 labeling_0.3 callr_3.4.3 ps_1.3.0 curl_4.2
[31] fansi_0.4.0 Rcpp_1.0.3 scales_1.0.0 backports_1.1.5 desc_1.2.0 pkgload_1.0.2
[37] jsonlite_1.6.1 fs_1.3.1 hms_0.5.2 digest_0.6.25 stringi_1.4.3 openxlsx_4.1.3
[43] processx_3.4.1 grid_3.6.1 rprojroot_1.3-2 cli_2.0.2 tools_3.6.1 magrittr_1.5
[49] lazyeval_0.2.2 tibble_2.1.3 zeallot_0.1.0 crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.0
[55] data.table_1.12.6 prettyunits_1.0.2 assertthat_0.2.1 rstudioapi_0.11 R6_2.4.0 compiler_3.6.1
[61] git2r_0.26.1

rationalize for provision of new word data sets

World clouds are a nice piece of visualization especially in slides. These usually terminate with a slide like "Thank you" or "Questions?" that could be rendered with a word cloud.
I propose to rationalize the existing infrastructure of the 'Love' word data set to make it easy to add new ones.
This is possible if there are 2 tables such as:

language stats table (ISO 639-3 code, L1, L2), and
the word dictionary (ISO 639-3 code, word).

Fit to plot area

Hi! Thanks for a very nice package. I wonder if there is a way to get the figure to fit to the plot area without adjusting max_size. There must be many applications where the number of terms and their relative sizes/lengths isn't known beforehand (in Shiny apps and so on).

Cannot put the biggest word at the center of the word cloud.

I want to plot a word cloud in which the biggest word (word sizes are scaled by the column "freq" which represents odds-ratio) is displayed at the center of the picture. Sometimes geom_text_wordcloud() worked. However, I failed with the data attached to this issue. The result looks like this

How can I make the biggest word such as "PK--M" at the center?
df.txt

Mask not performing

Hello,

I love this package and am eternally thankful for your reimplementation in ggplot.
I have managed to reproduce almost every wordcloud your provide in the vignette except the mask.

I have struggled with all aspects and cannot seem to get the mask function to work at all. There is no error at all on the console, but the masking is not applied to the resulting cloud. I have also replaced the default hearth.png mask with my own files in /extdata/ but same (lack of) issue.

Could this be something specific to my R install? I did not see this coming up as an outstanding issue, but I noticed a number of folks asking about this on the wordcloud2 forums.

Thanks again for this most excellent package

Error when using ggwordcloud inside of Shiny application

Here is a minimal shiny app, using one of the test examples in a renderPlot expression.

library(shiny)
library(ggwordcloud)

data("love_words_small")

set.seed(42)

ui <- fluidPage(

    titlePanel("GGWordCloud Test"),

    sidebarLayout(
        sidebarPanel(
        ),

        mainPanel(
           plotOutput("wc")
        )
    )
)

server <- function(input, output) {

    output$wc <- renderPlot({
        ggplot(love_words_small, aes(label = word, size = speakers)) +
            geom_text_wordcloud() +
            scale_size_area(max_size = 24) +
            theme_minimal()
    })
}

shinyApp(ui = ui, server = server)

This produces:

Warning: Error in [: subscript out of bounds

Any ideas? This is using R 3.6, and the latest CRAN versions of both Shiny (1.3.2) and ggwordcloud (0.4.0)

rcpp boosts efficiency

Unable to make the colourbar appear

Hello.
First off, thanks for the package, very easy to use and to customize.

There's just one thing that I can't do: make the color legend appear... I've tried many approaches, but it just doesn't want to appear. Is there a way to force it?

Below a small example, using the love words dataset:

library(ggwordcloud)
#> Loading required package: ggplot2
data("love_words_small")
data("love_words")

set.seed(42)
ggplot(love_words_small, aes(label = word, color = speakers)) +
geom_text_wordcloud() +
theme_minimal() +
scale_color_viridis_c(guide = "colourbar")

ggwordcloud fontfamily and legends

Hello,

I had two doubts to customize my wordcloud. The first was how to change the ggwordcloud fonts. The second was whether in data sets with groups, whether it would be possible to suppress the word length caption. I wanted something like:

I wrote to @lepennec , who asked me to post my problem here. He gave me one advice for changing the font that makes me realize where to use the family parameter. Later, I found the solution for the second problem. So, I decided to post my code here to help others and to check with the developers if it's an appropriate solution.

The example uses the same love words data set, grouping the languages in families. Here is the code:

library(ggwordcloud)
library(dplyr)
library('ISOcodes') # To find the font families we use the ISOcodes package


# Changing to the same ID of 'love words'
## Maybe it's not correct, but it's just an example
ISO_3 <- ISO_639_3 %>% select(Id, Family)
ISO_2 <- ISO_639_2 %>% select(Alpha_3_B, Alpha_2)

# Love words - merge to ISO
data("love_words")
dataWord <- merge(love_words_small, ISO_2, by.x="lang", by.y = "Alpha_2")
dataWord <- merge(dataWord, ISO_3, by.x="Alpha_3_B", by.y = "Id")

ggplot(dataWord, aes(label = word, x=Family, size = speakers, colour=Family)) +
  geom_text_wordcloud_area(show.legend = TRUE, family="Purisa") +
  scale_size_area(max_size = 24) +
  scale_x_discrete(breaks = NULL) +
  theme_minimal()+  guides(size = FALSE)

My solution was to provide both 'show.legend = TRUE' and 'family="Purisa"' in geom_text_wordcloud_area. To suppress the size legend, I used 'guides(size = FALSE)' from ggplot2.

Sorry for the naivety, but I was wrongly trying to change the parameters directly in the ggplot functions.

Thank you.

A issue of too much space between words

Dear creator

I have a problem while using geom_text_worcloud. The word spacing is too large. That happens even when I use the exactly the same code as yours. I find two questions online regarding to the same issue, but didn't find a solution. Can you let me know why? Thank you!

This is desired, but using wordcloud() command.

This is undesired with too much spacing while using geom_text_wordcloud with ggplot():

wordcloud needs a legend

Hi, I used geom_text_wordcloud_area() and it works like a charm! One thing I think it needs to have is a legend. This is especially the case when I add color = something so that I am able to distinguish which text belongs to which category. Maybe this is more like user suggestion than an issue I can provide after using it. Thanks!