More Charts with ggplot2
Learn to build few advanced charts like word cloud, parliament diagram, waffle chart, and hexbin chart using ggplot2.
Introduction to word cloud
A word cloud is a graph representing text data by frequency, i.e., the most frequently occurring words in a text are displayed with larger font sizes. This chart type helps visualize the main topics in a text and can be used to quickly identify important words or phrases.
Getting started with the ggwordcloud
package
We’ll create a word cloud in ggplot
using the ggwordcloud
package, an R package designed specifically for creating word clouds.
First, let’s import the ggwordcloud
package:
library(ggwordcloud)
The ggwordcloud
package provides several example datasets that can be used to create word clouds.
For example, we’ll use the thankyou_words_small
dataset from the ggwordcloud
package, which contains the word “Thank you” in different languages, the number of native speakers, and overall speakers of those languages.
We print a few rows of the dataset with the following code:
head(thankyou_words_small)
Basic word cloud using ggplot2
The ggplot2
package offers a specialized geom
called the geom_text_wordcloud()
function for creating word clouds through the ggwordcloud
package.
We can generate the word cloud using the code below:
ggplot(thankyou_words_small)+geom_text_wordcloud(aes(label = name))
- Line 1: We initialize a new
ggplot
object with theggplot()
function and pass the name of the datasetthankyou_words_small
. Using the+
operator, we add a layer to theggplot
object. - Line 2: We use the
geom_text_wordcloud()
function to create a word cloud. Next, we use theaes()
function to specify that the text for the word cloud should come from thename
variable.
Modifying the text size in a word cloud
In the previous word cloud, all the displayed words were the same size. However, it is possible to set the size of the words based on a numerical variable.
Here is an example:
ggplot(thankyou_words_small)+geom_text_wordcloud(aes(label = name,size = speakers))
- Line 3: We pass the
speakers
variable to thesize
argument of theaes()
function to show the size of each word in the word cloud proportional to the number of speakers for that language.
Word cloud text scaling and color customization
The default scaling of the text in the word cloud makes the words appear too small relative to the plot area. To improve the font size control, we’ll use the scale_size_area()
function in ggplot2
, which adjusts the text’s size based on the plot’s area.
In addition to adjusting the size of the words, we can also change the colors of the words based on a categorical variable. To do this, we’ll pass the variable name to the color
argument of the aes()
function.
Here is an example:
ggplot(thankyou_words_small)+geom_text_wordcloud(aes(label = name,size = speakers,color = name)) +scale_size_area(max_size = 20)
- Line 4: We change the colors of the words by passing the variable name
name
to thecolor
argument of theaes()
function. The plot shows colors based on the variable, and this adds a defaultggplot2
colormap (consisting of red, green, and blue) for thename
variable. - Line 5: We set the
max_size
argument as20
inside thescale_size_area()
function for adjusting the text’s size based on the plot’s