lepennec / ggwordcloud Goto Github PK
View Code? Open in Web Editor NEWA word cloud geom for ggplot2
Home Page: https://lepennec.github.io/ggwordcloud/
License: GNU General Public License v3.0
A word cloud geom for ggplot2
Home Page: https://lepennec.github.io/ggwordcloud/
License: GNU General Public License v3.0
In your example plots, it appears like the larger words all appear towards the centre of the image and the smaller words are on the periphery. When I try to use this package, I'm finding the opposite: that the larger words are around the periphery and the smaller words are at the centre.
is this something I can change? I'd like to see the larger words near the centre. I've tried relevelling the label factor a few different ways, but it doesn't seem to make a difference.
Hi,
Is there a way to position words grouped by a variable, but in one single cloud?
Maybe a could with multiple points to gravity?
Dear @lepennec and @espinielli,
I am currently trying to use ggwordcloud
to plot wordclouds in some custom shapes.
Unfortunately, I'm encountering issues getting the mask
argument of geom_text_wordcloud_area
to work when running my code from the console and the wordcloud appears in default circle shape:
# devtools::install_github("lepennec/ggwordcloud")
library(ggwordcloud)
data("love_words")
set.seed(42)
ggplot(
love_words,
aes(
label = word, size = speakers,
color = speakers
)
) +
geom_text_wordcloud_area(aes(angle = 45 * sample(-2:2, nrow(love_words),
replace = TRUE,
prob = c(1, 1, 4, 1, 1)
)),
mask = png::readPNG(system.file("extdata/hearth.png",
package = "ggwordcloud", mustWork = TRUE
)),
rm_outside = TRUE
) +
scale_size_area(max_size = 40) +
theme_minimal() +
scale_color_gradient(low = "darkred", high = "red")
Strangely, the wordcloud appears to plot correctly using reprex()
:
library(ggplot2)
library(ggwordcloud)
data("love_words")
set.seed(42)
ggplot(
love_words,
aes(
label = word, size = speakers,
color = speakers
)
) +
geom_text_wordcloud_area(aes(angle = 45 * sample(-2:2, nrow(love_words),
replace = TRUE,
prob = c(1, 1, 4, 1, 1)
)),
mask = png::readPNG(system.file("extdata/hearth.png",
package = "ggwordcloud", mustWork = TRUE
)),
rm_outside = TRUE
) +
scale_size_area(max_size = 40) +
theme_minimal() +
scale_color_gradient(low = "darkred", high = "red")
#> Warning in wordcloud_boxes(data_points = points_valid_first, boxes = boxes, :
#> Some words could not fit on page. They have been removed.
Created on 2024-06-12 with reprex v2.1.0
Obviously, I suspected some kind of graphical device issue on my Windows 10 machine and tried some troubleshooting by switching the graphics device in my RStudio 2024.04.2+764
as suggested in the following comment.
However, this didn't resolve my issue and I'd appreciate any feedback or pointers how to solve it.
Many thanks in advance!
My session.info()
is:
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C
[5] LC_TIME=German_Germany.utf8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] png_0.1-8 wordcloud2_0.2.1 wordcloud_2.6 RColorBrewer_1.1-3 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[8] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 tidyverse_2.0.0 ggwordcloud_0.6.2
[15] ggplot2_3.5.1
loaded via a namespace (and not attached):
[1] gtable_0.3.5 xfun_0.44 remotes_2.5.0 htmlwidgets_1.6.4 devtools_2.4.5 processx_3.8.4 callr_3.7.6
[8] tzdb_0.4.0 vctrs_0.6.5 tools_4.4.0 ps_1.7.6 generics_0.1.3 fansi_1.0.6 pkgconfig_2.0.3
[15] readxl_1.4.3 lifecycle_1.0.4 compiler_4.4.0 farver_2.1.2 textshaping_0.4.0 munsell_0.5.1 httpuv_1.6.15
[22] usethis_2.2.3 htmltools_0.5.8.1 yaml_2.3.8 urlchecker_1.0.1 later_1.3.2 pillar_1.9.0 ellipsis_0.3.2
[29] cachem_1.1.0 sessioninfo_1.2.2 mime_0.12 commonmark_1.9.1 tidyselect_1.2.1 digest_0.6.35 stringi_1.8.4
[36] labeling_0.4.3 fastmap_1.2.0 grid_4.4.0 colorspace_2.1-0 cli_3.6.2 magrittr_2.0.3 pkgbuild_1.4.4
[43] utf8_1.2.4 clipr_0.8.0 withr_3.0.0 promises_1.3.0 scales_1.3.0 timechange_0.3.0 rmarkdown_2.27
[50] cellranger_1.1.0 ragg_1.3.2 hms_1.1.3 shiny_1.8.1.1 memoise_2.0.1 evaluate_0.24.0 knitr_1.47
[57] miniUI_0.1.1.1 markdown_1.13 profvis_0.3.8 rlang_1.1.4 gridtext_0.1.5 Rcpp_1.0.12 xtable_1.8-4
[64] glue_1.7.0 xml2_1.3.6 pkgload_1.3.4 reprex_2.1.0 rstudioapi_0.16.0 R6_2.5.1 systemfonts_1.1.0
[71] fs_1.6.4
The examples in geom_text_wordcloud
generate a huge amount of warnings as seen online
https://lepennec.github.io/ggwordcloud/reference/geom_text_wordcloud.html
Hello! Thank you for your great package.
I am having one problem.
I am currently working on a shiny application for wordcloud that allows the user to adjust the size of the plot as shown below.
runApp(shinyApp(
ui = fluidPage(
sliderInput("size","size",min=100,max=1000,step=100,value=300),
plotOutput("cloud")
),
server = function(input,output, session) {
output$cloud = renderPlot(width=reactive(input$size),height=reactive(input$size),{
data("love_words")
ggplot(love_words,aes(label=word)) +
geom_text_wordcloud(
rm_outside = T
)
})
}
))
When the plot size is small, the R studio console displays the message "Some words could not fit on page. They have been removed.",
is there any way to make this visible to Shiny app users as well?
I have tried combining renderUI and sink() myself, etc., but it didn't work.
In the wordcloud package, I can set rm_outside to FALSE to center the words that could not be fitted,
but I would like to explicitly notify the user about the fitting failure.
I apologize for the lack of clarity as I am not a native English speaker, but I would appreciate your help.
Thank you.
Hi @lepennec,
many thanks again for this great package!
I'd like to use ggwordcloud
to plot a word cloud with a custom shape which I'd now like to annotate with custom text (e. g. using geom_textbox
or annotate
) as done in this blog post.
However, I'm currently not clear how the word cloud geom is placed in the coordinate system and how to properly add the text annotations at specific positions?
Many thanks in advance for your help!
I use this package with the "love" example and find that the results are both OK in Win10 and Ubuntu 16.04, however when I run the code in Mac OS, the Chinese characters cannot display correctly, they only show polygons. I also use wordcloud package and find the Chinese characters are OK, too. Even I set the text font family in theme function, it doesn't work. Can you tell me how to deal with it? Thank you!
Hi Erwan,
when answering this question on SO I stumbled over an issue which looks like a bug to me introduced with the new label_content
aesthetic added in ggwordcloud 0.6.0
.
When adding a legend to a word cloud the value of label_content
(which defaults to NA
) is displayed as the legend key glyph instead of the default "a"
one would expect from ggplot2::draw_key_text
. Here is a reprex of the issue:
library(ggwordcloud)
#> Loading required package: ggplot2
set.seed(42)
data("love_words_latin_small")
p <- ggplot(love_words_latin_small, aes(label = word, size = speakers)) +
geom_text_wordcloud(show.legend = TRUE) +
scale_size_area(max_size = 20) +
theme_minimal()
p
By default ggplot2::draw_key_text
sets a letter "a" as the default key glyph by checking the condition:
if(is.null(data$label)) data$label <- "a"
However, because of the new label_content
aes, the data passed from geom_text_word_cloud
to draw_key_text
now contains a column named label_content
. Hence, because $
does partial matching the check is.null(data$label)
returns FALSE
even if there is actually no label
column (whereas is.null(data[["label"]])
would return TRUE
) and more importantly for the same reason the value from the label_content
column (which by default is NA
) is used as the label for the legend key.
A workaround would be to set a value for label
or label_content
via the override.aes
argument of guide_legend
:
p +
guides(size = guide_legend(override.aes = list(label_content = "a")))
The Chinese Character cannot be shown in the example, I have tried the following:
ggplot(love_words_small, aes(label = word)) +
geom_text_wordcloud() +
theme_minimal(base_family = "STKaiti")
Still not work
Factors that are used for facetting aren't always balanced.
A facet for a rare category will contain very small wordcloud words. I think it would be nice if would rescale on a relative basis.
Hi @lepennec,
thanks again for your great support! I really appreciate it.
One thing that I'm currently thinking about is how to repeat words to fill up a word cloud to fit a custom shape mask.
Such a feature is available in the free online word cloud generator at https://www.wordclouds.com/ but I'm not sure how and if this is possible with ggwordcloud
.
Maybe you have any good ideas how to go about this?
Many thanks in advance!
Hello, I was playing around with your package and came across a problem when trying to use a mask. At first I thought it was something with my image but I tried re-producing the example in the vignette using the languages dataset and the heart shape. I basically copy-pasted the code and it just results in blank space. I tried waiting but even after about 20 minutes it was just blank square. Everything else in the vignette works - shapes (start, triangle, etc.), faceting, and coloring. Do you happen to have an idea what could be causing the mask issues? I saw another mask issue that was closed last year but could not find any details or explanations in the thread.
Provided at the end is the output of my SessionInfo().
Thanks for the amazing package!
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] png_0.1-7 stringr_1.4.0 lubridate_1.7.4 rio_0.5.16 dplyr_0.8.3
[6] wordcloud2_0.2.2 hwordcloud_0.1.0 ggwordcloud_0.5.0.9000 ggplot2_3.2.1
loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 remotes_2.1.1 purrr_0.3.3 haven_2.2.0 vctrs_0.2.0 colorspace_1.4-1
[7] testthat_2.3.2 usethis_1.6.1 htmltools_0.4.0 yaml_2.2.0 rlang_0.4.6 pkgbuild_1.0.6
[13] pillar_1.4.2 foreign_0.8-72 glue_1.4.1 withr_2.1.2 readxl_1.3.1 sessioninfo_1.1.1
[19] cellranger_1.1.0 munsell_0.5.0 gtable_0.3.0 zip_2.0.4 devtools_2.3.0 htmlwidgets_1.5.1
[25] memoise_1.1.0 forcats_0.4.0 labeling_0.3 callr_3.4.3 ps_1.3.0 curl_4.2
[31] fansi_0.4.0 Rcpp_1.0.3 scales_1.0.0 backports_1.1.5 desc_1.2.0 pkgload_1.0.2
[37] jsonlite_1.6.1 fs_1.3.1 hms_0.5.2 digest_0.6.25 stringi_1.4.3 openxlsx_4.1.3
[43] processx_3.4.1 grid_3.6.1 rprojroot_1.3-2 cli_2.0.2 tools_3.6.1 magrittr_1.5
[49] lazyeval_0.2.2 tibble_2.1.3 zeallot_0.1.0 crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.0
[55] data.table_1.12.6 prettyunits_1.0.2 assertthat_0.2.1 rstudioapi_0.11 R6_2.4.0 compiler_3.6.1
[61] git2r_0.26.1
World clouds are a nice piece of visualization especially in slides. These usually terminate with a slide like "Thank you" or "Questions?" that could be rendered with a word cloud.
I propose to rationalize the existing infrastructure of the 'Love' word data set to make it easy to add new ones.
This is possible if there are 2 tables such as:
Hi! Thanks for a very nice package. I wonder if there is a way to get the figure to fit to the plot area without adjusting max_size. There must be many applications where the number of terms and their relative sizes/lengths isn't known beforehand (in Shiny apps and so on).
I want to plot a word cloud in which the biggest word (word sizes are scaled by the column "freq" which represents odds-ratio) is displayed at the center of the picture. Sometimes geom_text_wordcloud() worked. However, I failed with the data attached to this issue. The result looks like this
How can I make the biggest word such as "PK--M" at the center?
df.txt
Hello,
I love this package and am eternally thankful for your reimplementation in ggplot.
I have managed to reproduce almost every wordcloud your provide in the vignette except the mask.
I have struggled with all aspects and cannot seem to get the mask function to work at all. There is no error at all on the console, but the masking is not applied to the resulting cloud. I have also replaced the default hearth.png mask with my own files in /extdata/ but same (lack of) issue.
Could this be something specific to my R install? I did not see this coming up as an outstanding issue, but I noticed a number of folks asking about this on the wordcloud2 forums.
Thanks again for this most excellent package
Here is a minimal shiny app, using one of the test examples in a renderPlot expression.
library(shiny)
library(ggwordcloud)
data("love_words_small")
set.seed(42)
ui <- fluidPage(
titlePanel("GGWordCloud Test"),
sidebarLayout(
sidebarPanel(
),
mainPanel(
plotOutput("wc")
)
)
)
server <- function(input, output) {
output$wc <- renderPlot({
ggplot(love_words_small, aes(label = word, size = speakers)) +
geom_text_wordcloud() +
scale_size_area(max_size = 24) +
theme_minimal()
})
}
shinyApp(ui = ui, server = server)
This produces:
Warning: Error in [: subscript out of bounds
Any ideas? This is using R 3.6, and the latest CRAN versions of both Shiny (1.3.2) and ggwordcloud (0.4.0)
Hello.
First off, thanks for the package, very easy to use and to customize.
There's just one thing that I can't do: make the color legend appear... I've tried many approaches, but it just doesn't want to appear. Is there a way to force it?
Below a small example, using the love words dataset:
library(ggwordcloud)
#> Loading required package: ggplot2
data("love_words_small")
data("love_words")
set.seed(42)
ggplot(love_words_small, aes(label = word, color = speakers)) +
geom_text_wordcloud() +
theme_minimal() +
scale_color_viridis_c(guide = "colourbar")
Hello,
I had two doubts to customize my wordcloud. The first was how to change the ggwordcloud fonts. The second was whether in data sets with groups, whether it would be possible to suppress the word length caption. I wanted something like:
I wrote to @lepennec , who asked me to post my problem here. He gave me one advice for changing the font that makes me realize where to use the family parameter. Later, I found the solution for the second problem. So, I decided to post my code here to help others and to check with the developers if it's an appropriate solution.
The example uses the same love words data set, grouping the languages in families. Here is the code:
library(ggwordcloud)
library(dplyr)
library('ISOcodes') # To find the font families we use the ISOcodes package
# Changing to the same ID of 'love words'
## Maybe it's not correct, but it's just an example
ISO_3 <- ISO_639_3 %>% select(Id, Family)
ISO_2 <- ISO_639_2 %>% select(Alpha_3_B, Alpha_2)
# Love words - merge to ISO
data("love_words")
dataWord <- merge(love_words_small, ISO_2, by.x="lang", by.y = "Alpha_2")
dataWord <- merge(dataWord, ISO_3, by.x="Alpha_3_B", by.y = "Id")
ggplot(dataWord, aes(label = word, x=Family, size = speakers, colour=Family)) +
geom_text_wordcloud_area(show.legend = TRUE, family="Purisa") +
scale_size_area(max_size = 24) +
scale_x_discrete(breaks = NULL) +
theme_minimal()+ guides(size = FALSE)
My solution was to provide both 'show.legend = TRUE' and 'family="Purisa"' in geom_text_wordcloud_area. To suppress the size legend, I used 'guides(size = FALSE)' from ggplot2.
Sorry for the naivety, but I was wrongly trying to change the parameters directly in the ggplot functions.
Thank you.
Dear creator
I have a problem while using geom_text_worcloud. The word spacing is too large. That happens even when I use the exactly the same code as yours. I find two questions online regarding to the same issue, but didn't find a solution. Can you let me know why? Thank you!
This is desired, but using wordcloud() command.
This is undesired with too much spacing while using geom_text_wordcloud with ggplot():
Hi, I used geom_text_wordcloud_area()
and it works like a charm! One thing I think it needs to have is a legend. This is especially the case when I add color = something
so that I am able to distinguish which text belongs to which category. Maybe this is more like user suggestion than an issue I can provide after using it. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.